Cloud is equally about efficiency like it is for scale

Thu, August 29, 2024 - 3 min read

Overprovisioning problem

Azure (and cloud in general) can be very expensive, but if you pick correct performance SKU’s for your workload and use features like auto-pause on SQL servers, where feasible (for example in non-prod environments) for the average large business it can potentially save you millions. Millions that you can spend on your already overworked CCOE team for example with an extra resource or two! ;)

The below image represent a potential cost saving of roughly $ 200k dollar/year in Azure SQL cost. On average the databases use is under 1% of their provisioned capacity measured in the last 30 days.

First version of a unified dashboard:

alt text

How did it get like this?

The answer is in most cases, because nobody focus on optimizing cost and performance, and that dev’s that just want their application to perform quick, overprovision their infrastructure to be “on the safe side” or lack experience in cloud infrastructure development and best practices, after all, “im a Java developer, not an infrastructure developer”. Sure, totally understandable.

Know your database

Another common error i see in large migration projects where we move multiple databases from a single on-prem server to the cloud is that suddenly all Azure SQL databases are provisioned with the power of that same on-prem server. Why? Because nobody knows exactly how much performance the individual databases needs, on-prem they just added an extra vCore to the server when it started to get slow. Now we need granular insight in to each database to pick the most cost-effective SKU. This requires monitoring over time.

Use environment parameters

The result can sometimes also be that dev, test and stage environment run the same database as in production and when a database in production cost 700+$/month this quickly adds up for databases that spend most of their time idling, and then its left to run for 5 years until someone raises their eyebrow in a board room meeting when they see the bill from their friendly cloud provider and ultimately decide to go back to on-prem.

So please help save the reputation of cloud computing by:

  • Regularly measure your databases efficiency against their capacity.
  • Enable features that can put your database to sleep when not used like “auto-pause” feature for the serverless SQL SKU in non-prod.
  • Build a culture around efficiency among the landing zone developers.
  • Configure budget alerts on your subscription.
  • Make sure your IAC pipelines can support different parameters for different enviroments.

The below image is generated from a personal project, that compares provisioned capacity vs actual average capacity based on metrics and advices on a more cost effective SKU. In this case by aiming for 70% utilization 200k dollar annually could potentially be saved for over 200+ databases

Final words:

Cloud is not only about scale, its also about efficiency. If you are paying more for your infrastructure in the cloud your doing something wrong.