Valérian de Thézan de Gaussan · Data Engineering for process-heavy organizations

The two-fold cost analysis that I use for any data infrastructure.

It yields a lot of value by mostly removing stuff.

1️⃣ First fold: tool-based cost analysis.

👉 Figure out how much each tool is costing.

If you use cloud-based services, cloud providers will give you a cost breakdown for each service. Easy.

If you use on-premise, figure out the price of the infrastructure it runs on, and what pourcentage of that infrastructure is required for this tool. Fairly easy.

With this analysis done, you want to first look for aberrant values. If one of your tool is costing 90% over the budget, focus on that.

You also have an overall cost of the data infrastructure if you had not before.

2️⃣ The second fold, way less common, is the process-based cost analysis.

👉 Instead of focusing on the tools, we’re going to focus on the ROI of the data processes.

If an ELT process is using a lot of capacity and only is used in a dashboard that nobody reads, maybe it’s time to let go of that process.

This type of analysis might be more challenging to conduct, yet it offers the benefit of prompting inquiries that haven’t been considered previously.

Finally, it provides a comprehensive perspective on the entire data infrastructure.

📉 As a result, you often get to remove tools and processes that were costing a good chunk of the budget, while now having an effective method to manage the expenses in the future.

Here’s an example of the aggregated cost of multiple AWS services reduced by 3.5x, by removing services that were not bringing value and optimizing those that do bring value.