“Cleanliness and consistency of data layer is the most important thing in building a agentic FinOps system.”

Sanjeev Mittal, CPTO at CloudKeeper highlights how AI-driven FinOps is evolving through conversational cloud intelligence, unified multi-cloud data layers, and agentic automation.

Sanjeev Mittal, CPTO, CloudKeeper

FinOps is rapidly evolving from a cost-monitoring function into a strategic operational discipline. The growing complexity of multi-cloud infrastructure, rising GPU consumption, and the expansion of agentic AI workloads are forcing organisations to rethink how they manage visibility, optimisation, and governance across cloud ecosystems. At the same time, conversational AI interfaces are beginning to change how finance and engineering teams interact with cloud intelligence, making real-time decision-making more accessible at the executive level.

In this interaction with CIO&Leader, Sanjeev Mittal discusses how CloudKeeper is building AI-driven FinOps capabilities for multi-cloud enterprises through platforms such as Lens GPT, Tuner, and Comet. He explains the architectural importance of clean and consistent data layers, the shift from dashboard-driven FinOps to conversational cloud intelligence, and why the future of autonomous infrastructure will still require human oversight despite rapid advances in agentic AI systems.

CIO&Leader: CloudKeeper recently extended Lens GPT capabilities for multi-cloud environments. What architectural or data-tagging challenges did your team face while building an agentic FinOps platform capable of maintaining a unified conversational experience across AWS, GCP, and Azure?

Sanjeev Mittal: Lens GPT actually works across all major clouds. AWS itself is nearly 20 times larger than GCP from a workload perspective for us, and Azure support is also being launched.

The challenge fundamentally comes down to maintaining conversational consistency across different cloud environments. For example, when a customer asks, “What caused my cloud cost spike over the last week?”, the platform needs to understand whether the customer operates only on AWS, or across AWS and GCP together.

The key differentiator lies in our data architecture. We built a consistent data layer across all clouds. The underlying data structures remain standardised irrespective of whether the customer is using AWS, GCP, or Azure.

That consistency significantly simplifies how queries are processed.

On top of that, we have an intelligence layer that interprets English-language queries in the context of the customer’s environment. If the customer only operates on AWS, the system automatically restricts the analysis to AWS spend. If the customer operates on AWS and GCP together, the platform intelligently pulls information from both environments without requiring additional clarification.

The consistency of the data structure reduces processing complexity and also keeps the cost of generating responses relatively low.

The combined intelligence of the data layer and conversational context engine enables us to provide coherent multi-cloud responses. Beyond that, much of the architecture remains proprietary.

CIO&Leader: CloudKeeper recently launched the Lens GPT MCP Server. What enterprise demand drove the move beyond a traditional UI into a programmatic conversational interface?

Sanjeev Mittal: Our first launch was Lens GPT directly on the web platform. We continuously monitor customer adoption patterns, including how customers themselves are adopting AI technologies.

We observed that many customers had already integrated tools like ChatGPT, Claude Desktop, Cursor, Kiro, and other AI-native IDE environments into their daily workflows. Finance teams and technical teams were increasingly operating inside those environments rather than traditional dashboards.

Instead of expecting customers to leave those tools and log into our platform separately, it became a natural extension for us to embed our intelligence layer directly where they already work. Now, customers can query Lens GPT directly from within ChatGPT, Claude Desktop, or their IDE environment instead of switching contexts.

Since launching MCP support, adoption has increased significantly because the experience is far more convenient and integrated into daily workflows.

CIO&Leader: Many enterprise conversational platforms today are essentially LLM wrappers layered over datasets. Architecturally, how does a true agentic FinOps system differ from a simple query-response chatbot?

Sanjeev Mittal: The biggest mistake many companies make is ignoring the importance of the data layer itself. The cleanliness, consistency, and structure of the data lake are foundational. Even if the platform expands into new services later, the data architecture must remain standardised across all inputs.

The second challenge historically involved maintaining conversational context. Earlier chatbot systems struggled because context windows were limited. Companies had to rely heavily on RAG architectures and prompt injection techniques to preserve continuity.

Modern LLMs now support significantly larger context windows, which has fundamentally changed the architecture of these systems.

In our case, the platform operates within the boundaries of the customer’s available datasets and cloud access permissions. As users continue interacting with the platform, the system learns from those queries and proactively generates suggested prompts and follow-up questions relevant to the ongoing investigation. That allows the platform to guide users deeper into cost analysis and optimisation workflows.

Unlike traditional chatbots that simply process isolated queries, our system continuously builds contextual understanding around the customer’s cloud environment, learns from repeated usage patterns, and improves the accuracy of responses over time.

CIO&Leader: FinOps historically sits between finance and engineering teams, which often operate with very different priorities. How does a natural language FinOps interface change that relationship?

Sanjeev Mittal: It changes it dramatically. Even before AI, our Lens platform already provided more than 100 dashboards tailored primarily for finance personas rather than DevOps teams. These dashboards helped finance leaders analyse cost anomalies and spending behavior independently.

The problem at the executive level was not necessarily lack of data. It was lack of time.

A CFO does not want to navigate through hundreds of dashboards to identify why cloud costs increased last month. Previously, the process involved calling the FinOps team, waiting for custom reports, Excel sheets, and manually assembled analyses.

AI for FinOps eliminates that friction entirely. Now a CFO or CIO can directly ask Lens GPT a question such as “Why did my cloud costs spike last week?” and receive the answer within seconds through natural language.

The speed and accessibility of information have fundamentally increased executive confidence in cloud financial data.

CIO&Leader: Cloud financial data can reveal infrastructure scale, operational patterns, and enterprise behavior. How are you ensuring security, compliance, and privacy while enabling conversational AI access to this information?

Sanjeev Mittal: We typically support two categories of customers.

The first category involves standard cloud consumption telemetry data. This includes information already available through AWS, GCP, or Azure, such as account IDs, infrastructure consumption metrics, compute usage, and software utilisation patterns.

This is telemetry data rather than sensitive business or personally identifiable information. There is no PII involved in most standard cloud consumption datasets. We maintain strict governance through SOC 2 compliance, ISO certifications, controlled access boundaries, and secure data management practices.

The second category involves customers who combine proprietary enterprise datasets with cloud telemetry data for deeper insights. In those cases, we deploy dedicated tenants with significantly higher isolation and security controls around those environments.

Both categories remain SOC 2 compliant and follow strict governance standards.

CIO&Leader: Agentic AI workflows themselves consume significant cloud resources. How do you optimise the efficiency of Lens GPT while still delivering the 25% cloud savings you promise customers?

Sanjeev Mittal: The 25% savings come from the combined value of Lens, Tuner, and Comet together.

Comet typically delivers 10–12% savings through pricing optimisation. Tuner contributes another 8–10% by eliminating resource wastage and overprovisioning. Lens contributes around 3–5% through visibility-driven optimisation opportunities.

Internally, we are also extremely conscious about our own AI infrastructure efficiency.

We do not use LLMs simply for the sake of using LLMs. During the initial product iterations, a larger portion of the system may rely on LLM-based processing. Over time, however, we continuously optimise the architecture and reduce unnecessary LLM dependency.

We do not use LLMs simply for the sake of using LLMs. The first version of a product may depend heavily on them, but over time we continuously reduce unnecessary LLM dependency to optimise cost and efficiency.

For example, a product feature set that initially depended on LLMs for eight out of ten functions may eventually reduce that dependency to only three functions after optimisation.

That significantly lowers operational AI costs while preserving functionality.

CIO&Leader: Looking ahead, how close are we to fully autonomous self-healing infrastructure where AI agents can independently detect and resolve cloud cost anomalies without human intervention?

Sanjeev Mittal: Technically, we are not very far away. The capability already exists for agents to provision infrastructure, execute tasks autonomously for hours, and decommission infrastructure afterward. However, the real limitation is not technology. It is risk tolerance.

Fully autonomous infrastructure is technically possible much sooner than people think. The real limitation is not technology — it is enterprise risk tolerance.

Autonomous operations will likely mature faster in non-production environments where the consequences of failure are lower. But in production-grade enterprise infrastructure, even a one-in-a-thousand mistake can become extremely expensive.

That is why human oversight will continue to remain important.

The role of humans will evolve away from repetitive operational work such as ticket management or manual provisioning. Most of those workflows will become agent-assisted.

But fully autonomous production infrastructure will still take longer because enterprises will continue prioritising operational safety and accountability.

Share on