Do you remember the promise made by the cloud to save us money? It was meant to be the efficiency play of all times. However, a bad news is hiding in the bills of countless firms. The same systems that were created to scale, serverless and event-driven architectures are now causing financial havoc. Why are all the IT leaders gazing at bills that increased three times in a night and no significant increase in traffic? The solution is found deeper than you assume.
The Scalability Trap
Serverless computing was a dream that they sold us. You only pay what you consume, to the milli-second. It’s a perfect model, in theory. However, this forms a tax invisible on all digital actions. An upload by an individual user, a small API request, a database event–they all cost money. It is the essence of the problem in this granularity. We created systems that have unlimited scaling, whereas our budgets have none whatsoever.
- Most cloud architects of a large fintech have observed that they stopped dealing with cattle and began dealing with ants. The monetary management of an ant colony is entirely different.
The Incident Command Domino Effect
Webs of microservices are modern applications. Consider a very basic action such as user uploading a profile picture. That does not happen and only stimulate a single function. It can set off a cascade. A first is the validation of the file by a function. Then another resizes it. It is then scanned by an AI service to get its content. Lastly, notification is dispatched. One thing led to another four or five. This domino effect is having a devastating effect on pre-allocated cloud budgets. It is beautiful because it is complex and brutal because it is not cheap.
Where Have we Lost the Monitoring?
Conventional IT monitoring instruments are virtually unaware of this problem. They were constructed in a world of incessant virtual machines. They follow huge, deterministic assets. IT monitoring instruments therefore do not see the tsunami of micro-transactions. Your total compute expenditure may appear to be stable. However, there is a hidden line in the item budget showing “Lambda-GB-Seconds” which has burst. This is one of the essential gaps of visibility. How could you cope with the things that you cannot even see?
The Data analytics and AI Amplifier
It is at this point that the issue gets exponentially worse. The workloads of AI and complex Data Analytics are event-driven in nature. Not a single call to the serverless AI endpoint to recognize an image is a call. It may have more than one chained function which is huge in amount of memory. They are not low cost operations. They are the Ferraris of the cloud, so powerful but unbelievably expensive and you pay a minute the engine is on, even when you are just waiting at a red light.
A Real-World Case Study
Take a case of a medium sized e-commerce platform. They incorporated an additional AI-based recommendation engine. The goal was to boost sales. It was successful–15 per cent. more conversions. Nonetheless, they paid an increased amount of more than 65,000 in monthly cloud bill, as compared to the previous amount of15,000. The culprit? Not only product pages were calling their serverless AI model, but all pages on their site were being loaded, including the homepage and cart. Their AI service was even more expensive than their total IT infrastructure expenditure. Victory was close to ruining them.
The Battle against Smarter Cloud Computing
So, what’s the solution? The solution to the question is not to drop innovation. It is to embrace a new field commonly known as FinOps. This isn’t about cost-cutting. It is regarding financial responsibility and smarter choices in architecture. Intelligent throttling is being put into practice by teams as well as event batching in queues. They are downsizing memory allocations of their functions. A 10 percent change in this instance can cause a 30 percent decrease in the cost in that instance. The tools are now finally becoming up to date, with websites such as CloudZero providing the visibility required at the granular level.
- The discussion has changed to not only can we build it, but at what size can we build it? one lead of a DevOps team posted on one of the recent forums.
A New Regulation on Network Administration
This alters the basic position of IT and Network Administration. It is no longer a matter of uptime and security in the job. It is now about economic rules of the dynamic systems. The engineers need to be empowered in order to know the financial implication of their code. Architects need to build on low costs and durability as well as expediency. This involves change of culture. It requires the cooperation of the finance and development and operations as never before.
Final Thought
The promise of the cloud was not a lie but it was an incomplete one. We acquired marvelous agility and strength. We took in exchange another level of financial complexity. The businesses that will succeed are those that appreciate the fact that it is not a technical issue. It’s a business one. They will make cost intelligence a part of their very development lifecycle. The question to all the leaders is the following: Do you build a system that you can scale, or simply afford? The future of your IT strategy will be based on the answer.