Arun Shetty, CTO & Senior Director, Cisco India & South Asia outlines a shift to AI-ready infrastructure, emphasising purpose-built networks, integrated security, and data platforms like Splunk to help enterprises scale AI from pilots to production with resilience and control.

The shift from experimentation to execution is defining the current phase of enterprise AI adoption. As organisations move beyond pilots, the focus is no longer limited to models and algorithms, but on the foundational infrastructure required to operationalise AI at scale. Rising compute demands, data gravity, network complexity, and security risks are forcing CIOs to rethink how IT environments are architected to support always-on, autonomous systems.
In this conversation, Arun Shetty, CTO & Senior Director, Solutions Engineering at Cisco India & South Asia, explains how Cisco is re-engineering AI-ready infrastructure. He highlights the move toward agentic systems, the need for purpose-built networks, and the concept of a “Secure AI Factory,” alongside the role of observability, data platforms like Splunk, and integrated security in enabling enterprises to scale AI with resilience, trust, and operational simplicity.
CIO&Leaders: Cisco is increasingly shifting toward AI-ready infrastructure that goes beyond hardware. What is the most critical architectural change underway, and how is it enabling Indian enterprises to move from AI pilots to production?
Arun Shetty: If you look at how AI is evolving, it has moved beyond simply answering questions. We are now transitioning into a phase where AI systems can take actions. This marks the emergence of AI agents, and in the future, we will likely see the rise of physical AI as well. This shift is already underway and is fundamentally changing how enterprises operate.
We are now transitioning into a phase where AI systems can take actions. This marks the emergence of AI agents, and in the future, we will likely see the rise of physical AI as well.
There are three major changes taking place, along with corresponding constraints.
From a technology perspective, there will be a significant increase in traffic, a rise in the number of devices, and an expansion of risk exposure. AI agents will effectively become part of the workforce, contributing to this growth in scale. As traffic and device density increase, the associated risks also grow proportionally.
From an operational standpoint, complexity is becoming a critical challenge. Even today, enterprise environments are highly complex, with a mix of cloud infrastructure, legacy applications, and varied infrastructure strategies across organisations. AI further amplifies this complexity. As complexity increases, it also exposes a gap in skills, which becomes an operational challenge for enterprises.
The third dimension is people. Expectations from both employees and customers are rising significantly. There is an increasing demand for immediate outcomes and seamless experiences, driven by what AI is now capable of delivering. This shift in expectations is a major factor enterprises must prepare for.
These three dimensions—technology scale, operational complexity, and rising expectations—define the broader impact of AI and how organisations must respond.
In addition to these changes, there are three major constraints or architectural considerations that enterprises must address in the AI era.
The first constraint is infrastructure. This includes power, compute, and networking. These foundational elements must scale to support AI workloads, making infrastructure readiness a primary requirement.
The second constraint is trust, which encompasses both security and safety. Security concerns are widely understood, but safety is equally critical and often less discussed. Safety relates to the intrinsic behavior of AI models and applications. When an input is provided to an AI model, the output is not always consistent. Issues such as hallucinations, toxicity, and unpredictable responses can occur. These behaviors are inherent to how models function, which makes it essential to ensure that models behave in alignment with intended outcomes. This is why safety becomes a critical component alongside security, contributing to what can be described as a trust deficit.
The third constraint is data. Most AI models today are trained on publicly available data, including text, video, and audio. However, enterprises possess their own proprietary data, which can be leveraged to derive more meaningful outcomes. By using enterprise data, organisations can train, distill, and correlate models more effectively, thereby unlocking greater value from AI implementations.
Taken together, these infrastructure, trust, and data challenges represent the critical areas that enterprises must address as they move from AI experimentation to production at scale.
CIO&Leaders: Organisations are heavily investing in AI infrastructure, often prioritising compute at scale. From a systems engineering perspective, how should CIOs balance power and data demands for AI while integrating with existing legacy infrastructure?
Arun Shetty: From an infrastructure standpoint, the starting point is always the use case. Enterprises must clearly define what they aim to achieve. Based on that, they need to decide whether workloads should be deployed on-premises or in the cloud. In many cases, organisations initially experiment in the cloud and later bring workloads back on-premises due to concerns around data security and data sovereignty.
Identifying the use cases and aligning the required infrastructure to support them is therefore a critical first step. Once this clarity is established, enterprises can determine the number of GPUs required, the level of compute needed, and the overall architecture.
At Cisco, this approach is defined as a Secure AI Infrastructure. This is not limited to compute alone; it is a full-stack architecture developed in collaboration with NVIDIA. The stack ensures end-to-end observability and embeds security across every layer—from silicon to applications. Security is not an add-on but an integral part of the entire architecture.
These architectures are validated through two frameworks. NVIDIA provides certification through its Enterprise Reference Architecture, while Cisco offers its own Cisco Validated Designs (CVDs) to ensure that the infrastructure is optimised and deployment-ready.
Once the infrastructure is established for pilot workloads, the next step is scalability.
The first stage is scale-up, where the existing stack is expanded to increase compute capacity.
The second stage is scale-out, where infrastructure extends beyond a single rack to multiple racks within a data center.
The third stage is scale-across, which becomes necessary when local constraints such as power availability limit further expansion. In such scenarios, workloads are distributed across multiple data centers, often located closer to power sources.
Infrastructure must also be viewed holistically. It includes not only compute, but also network and storage. Cisco works with ecosystem partners to deliver integrated storage solutions, while leveraging the broader NVIDIA architecture to ensure successful deployment of AI use cases.
On the networking side, high throughput is critical. Cisco’s Silicon One G300 enables switching capacity of up to 102.4 Tbps within data centers, while the P200 routing platform supports up to 51.2 Tbps for inter-data center connectivity. These capabilities are essential because AI workloads—particularly inferencing driven by agents—operate continuously and require sustained high bandwidth. As organisations scale across data centers, high-speed interconnects become equally important.
In addition, Splunk plays a key role through its Data Fabric. It enables the ingestion and correlation of large volumes of data, which can then be leveraged for advanced analytics, including MachineGPT, to unlock additional use cases and insights.
This integrated approach defines what Cisco refers to as a Secure AI Factory, where infrastructure is designed with both performance and security as foundational principles.
From a security and safety perspective, protection must extend beyond the infrastructure to the AI applications themselves. Cisco AI Defense addresses this requirement by enabling organisations to discover all AI applications in use, detect vulnerabilities within them, and assess risks associated with downloaded models.
It also provides runtime protection, ensuring that AI applications remain secure while operating. This end-to-end capability—from discovery to runtime protection—forms a critical component of the Cisco Secure AI Factory.
CIO&Leaders: With the integration of Splunk, how is Cisco re-engineering its networks to function as primary sensors for AI-driven threats?
Arun Shetty: To understand this, it is important to first examine the role Splunk plays in completing the broader Cisco solution.
At the core is the concept of digital resilience, which is the ability of an organisation to remain securely operational despite disruptions. These disruptions could range from IT outages and security breaches to unforeseen incidents such as configuration errors. In some cases, incidents cannot be prevented, making the ability to respond effectively just as critical as prevention. As a result, digital resilience has become a board-level priority for organisations.
Enterprise environments today are inherently complex. They span private cloud, public cloud, legacy applications, and modern microservices. Additionally, organisations often rely on infrastructure they do not fully control, such as the internet or SaaS platforms. This lack of ownership leads to limited end-to-end visibility, which in turn delays issue detection and resolution.
To address this, Cisco focuses on three key pillars to achieve digital resilience.
The first is assurance, which relates to end-to-end connectivity across the entire digital ecosystem—from on-premises environments to the cloud. Systems continuously generate telemetry data, which can be collected, distilled, and correlated to identify the exact source of an issue. With complete visibility, organisations can determine whether a problem originates within their own infrastructure or from an external service.
The second pillar is observability, which ensures a consistent and high-quality user experience. By maintaining end-to-end visibility, organisations can monitor application performance, reduce downtime, and proactively address issues before they impact users.
The third pillar is security operations, which involves the ability to prevent, detect, investigate, and respond to threats. The shift here is from a prevention-only mindset to a comprehensive response-driven approach.
This is where Splunk becomes critical. It acts as a unified data platform, aggregating telemetry from across the enterprise. Different teams—IT, security, and engineering—may use different tools, but they operate on the same underlying data. This shared data foundation simplifies problem identification and accelerates resolution.
The integration of Splunk into Cisco’s architecture enables all data generated from Cisco platforms to be ingested and analysed within a single framework. This is referred to as the Cisco Data Fabric, which allows organisations to derive actionable insights and make informed decisions in real time.
In the context of AI, particularly with the rise of AI agents, the requirement shifts toward detecting and responding at machine speed. Traditional response times are no longer sufficient. Organisations must be able to process signals, identify threats, and take action in real time.
This is driving the evolution toward Agentic Security Operations. One aspect of this is enabling advanced analytics and design capabilities that allow organisations to move from reactive to proactive security models. Another is the expansion of the Security Operations Center (SOC) through AI agents.
These agents can take on roles such as detection assistants, detection agents, response orchestrators, and autonomous response agents. They automate workflows, assist human operators, and in some cases, take independent actions based on predefined policies.
This shift represents the next phase of security operations—where AI agents augment human capabilities, improve efficiency, and enable faster, more intelligent responses. It reinforces the role of Splunk not only in digital resilience but also as a foundational platform for AI-driven security in modern enterprise environments.
CIO&Leaders: Many AI initiatives fail to scale due to data center limitations rather than model performance. What are the key blind spots that prevent AI projects from scaling?
Arun Shetty: This is a critical point. The network plays a central role in AI environments, particularly with the rise of AI agents. Unlike traditional workloads, agents operate continuously, which creates a persistent and consistent load on the network. As a result, infrastructure must be designed to support 24/7 operations.
The first requirement is a purpose-built network for AI. This includes high-speed Ethernet and advanced switching capabilities. For example, emerging architectures can support speeds ranging from 800 Gbps to 1.6 Tbps per port within the LAN environment. Such capabilities are essential to handle the scale and performance requirements of AI workloads.
The second requirement is operational simplicity. Enterprises must have the ability to quickly identify where issues occur and resolve them efficiently. Without clear visibility and simplified operations, scaling becomes difficult.
The third requirement is integrated security. Security must be embedded directly into the network fabric. Given the scale of AI-driven traffic, the network itself must be inherently secure and capable of scaling without introducing vulnerabilities.
At Cisco, these challenges are addressed through the Silicon One architecture and integration with NVIDIA Spectrum-X. This ensures that infrastructure limitations in AI environments are mitigated through optimised performance, scalability, and speed.
CIO&Leaders: As AI agents become integral to enterprise environments, how does a self-healing, intelligent network operate under the load of autonomous systems?
Arun Shetty: One of the primary challenges in enterprise environments today is the high mean time to detect (MTTD) and mean time to repair (MTTR), largely due to limited visibility. Improving visibility is therefore essential.
The industry is moving beyond traditional monitoring and centralised management toward Agentic Operations (Agentic Ops). This represents a shift from passive monitoring to active execution, where AI agents play a key role.
At Cisco, we introduced AI Canvas as part of this approach. AI Canvas enables three core capabilities.
First, multi-domain troubleshooting. When an issue arises, it may originate from the network, application, server, or other infrastructure components. A self-healing system must be capable of diagnosing issues across all these domains.
Second, collaborative operations. In many enterprises, troubleshooting is performed in silos across different teams. By providing all teams with access to a unified data view, collaboration improves and problem resolution becomes more efficient.
Third, the use of a proprietary Deep Network Model (DNM). This model enables AI-driven diagnostics and automation. For example, an operator can initiate a troubleshooting request, after which AI agents analyse network performance, identify the root cause, and recommend corrective actions.
In practice, the system can go further. With appropriate governance, agents can autonomously execute corrective actions, such as reconfiguring network parameters to resolve performance issues. The human-in-the-loop model remains available, but organisations can also enable fully autonomous operations where appropriate.
These capabilities extend across domains. If an issue is not network-related but originates in the application layer, the system can identify and address it accordingly.
Agentic Ops represents a shift toward simplified operations in highly complex environments, enabling organisations to move from reactive troubleshooting to proactive and autonomous resolution.
CIO&Leaders: Beyond encryption, what safeguards must be embedded at the network layer to safely deploy autonomous AI systems?
Arun Shetty: The adoption of AI agents significantly expands the attack surface. Addressing this requires a structured approach across three dimensions.
The first is protecting the enterprise from AI agents. Since agents can operate autonomously, it is essential to ensure that they act within defined boundaries. This requires extending Zero Trust principles to AI agents. Organisations must treat agents as digital employees, with defined identities, access controls, and traceability. Each agent should have a human owner, and all actions must be auditable. Strict identity and access management ensures that agents only perform authorised actions.
Organisations must treat agents as digital employees, with defined identities, access controls, and traceability.
The second is protecting AI systems from external threats. This includes both model-level risks and external manipulation attempts. Cisco AI Defense addresses this by enabling discovery of all AI applications, conducting vulnerability assessments, and validating models before deployment. It also includes supply chain security checks for downloaded models and runtime guardrails to ensure safe operation.
The third is detecting and responding at machine speed. As discussed earlier, platforms like Splunk enable real-time detection and response through Agentic Security Operations. This ensures that threats are identified and mitigated at the speed required in AI-driven environments.
Together, these layers, identity, protection, and rapid response, form the foundation of secure AI deployment.
CIO&Leaders: Compared to the cloud transition, what makes the AI infrastructure shift more complex for modern CIOs?
Arun Shetty: The defining factor is the pace of innovation. AI is evolving at a significantly faster rate than previous technology shifts.
To illustrate, the United States is projected to require approximately 62 GW of power for AI workloads by 2028. Context window sizes are expanding rapidly, currently reaching around one million tokens. At the same time, global AI spending is expected to reach approximately US $3.3 trillion.
AI has the potential to either disrupt or accelerate every industry. Organisations must therefore proactively identify relevant use cases and align their strategies accordingly.
From an enterprise perspective, this requires a structured approach to identifying opportunities, deploying use cases, and scaling infrastructure. At the same time, security and safety remain critical priorities, requiring continuous visibility and governance.
In the Indian context, the scale of transformation is equally significant. Data center capacity is expected to reach approximately 8 GW by 2030, with infrastructure investments projected to reach US $70 billion by 2026. AI is also expected to contribute around US $1.7 trillion to the Indian economy.
These indicators highlight the magnitude of the shift underway. AI will fundamentally reshape how organizations operate, both by introducing disruption and by accelerating existing processes. Enterprises must be prepared to navigate both outcomes.