The Agentic Era Part 5: Building the Agentic Cloud
Why the current cloud infrastructure is ill-suited for agentic systems and requires a new "agentic-native" approach.
The artificial intelligence landscape stands at an inflection point that should profoundly reshape how we think about computational infrastructure. The emergence of agentic AI systems—autonomous agents capable of complex reasoning, coordination, and persistent memory—demands architectural approaches that differ from today's cloud computing paradigms.
This transformation raises critical questions about the $320 billion in projected infrastructure investment by hyperscale providers during 2025 and whether this massive spending targets the right architectural foundations.
Over the first four parts of this Agentic Era series, I’ve established that orchestration creates new competitive moats, that architectural choices between model-centric and workflow-centric approaches determine value capture, that protocols like MCP and A2A form the invisible operating system for coordination, and that outcome-based frameworks must replace traditional SaaS metrics.
For this latest chapter, I want to examine how infrastructure choices and transformation will determine which companies capture disproportionate returns, and how that may play out across three infrastructure categories: hyperscalers, neoclouds, and emerging agentic clouds.
The Stakes
Consider the scale of global infrastructure investment currently underway.
Amazon leads hyperscalers with $100 billion of investment announced for this year, followed by Microsoft at $80 billion, Google at $75 billion, and Meta at $60-65 billion. Spending by hyperscalers represents a 50% increase from 2024's already massive $197 billion deployment.
While these incumbents have a head start of years, if not decades, their supremacy seemed at least partially vulnerable to brash neocloud upstarts such as CoreWeave.
In March, CoreWeave’s $23 billion valuation at IPO sparked celebration as a standout AI success. Dubbed the “AI hyperscaler” with 250k+ GPUs and NVIDIA backing, the company boasted $1.92 billion in revenue, an $11.9 billion OpenAI contract, and a $35 billion IPO target. Its higher GPU performance at lower cost appeared to cement its leadership in today’s AI infrastructure.
Yet as CoreWeave neared its IPO, there were growing doubts about the durability of its moat and model. The company adjusted its IPO targets in a challenging macro-economic context that did not help. After initial market volatility, CoreWeave’s stock has soared 316%, thanks in part to its recent announcement that the company had raised $7.5 billion in debt.
Amid this renewed euphoria, CoreWeave faces the strategic question of how to adapt to the Agentic challenge. So far the focus has been on demand and AI infrastructure size.
The Architectural Gap
Current cloud infrastructure operates according to principles that align well with traditional AI applications but create fundamental constraints for agentic systems.
The prevailing model is that of a traditional library system. Users request specific resources, receive them through centralized channels, and return them when finished. This model perfectly suits today's AI Agents, which are task-specific systems that operate with limited autonomy and reactive behaviors, aligning well with existing infrastructure designs prioritizing single-tenant GPU utilization and hub-and-spoke networking.
Agentic AI, however, demands what resembles a collaborative research university: multiple specialized entities working simultaneously, sharing persistent knowledge, and coordinating dynamically. This requires four critical architectural dimensions that existing providers handle poorly:
· Multi-agent collaboration necessitates sophisticated orchestration where specialized agents—planners, retrievers, synthesizers—coordinate through persistent communication channels.
· Dynamic task decomposition requires systems that adaptively redistribute computational workloads based on real-time changes.
· Persistent memory architectures must support extended multi-agent workflows rather than stateless execution cycles.
· Outcome-based resource allocation aligns infrastructure costs with successful task completion rather than raw computational consumption.
The memory architecture illustrates this discontinuity: while traditional AI agents require 8-16 GB per TFLOP, agentic systems demand 20-32 GB per TFLOP to support persistent context storage and inter-agent coordination per industry experts. This expansion affects power delivery, cooling infrastructure, and facility economics across multiple dimensions simultaneously.
Networking and Storage Transformation
The transition to agentic architectures demands parallel evolution in both networking and storage paradigms. Agent-to-agent communication requires sub-100ms latency with 1-10 Gbps dedicated bandwidth, fundamentally altering data center topology from hub-and-spoke designs toward mesh networking—like transforming a centralized transportation system into an interconnected urban grid where every node can communicate directly with multiple others.
Traditional load balancer designs, optimized for routing client requests to server endpoints, cannot efficiently handle direct coordination protocols where agents maintain persistent communication channels with multiple counterparts simultaneously. This networking transformation coincides with explosive storage demands: vector databases can experience 10x data expansion when embedding text for agent reasoning, with total storage reaching 30x original data size after indexing and context persistence.
Organizations deploying agentic systems must budget for 10-30x storage expansion and 3-5x memory overhead compared to traditional applications, creating I/O patterns that demand rapid context retrieval while maintaining consistency across distributed agent coordination. This necessitates specialized approaches to data locality and caching strategies that optimize for task success rather than resource utilization—a shift from computational optimization to coordination intelligence.
Protocol adoption accelerates these architectural demands. Anthropic’s Model Context Protocol (MCP), launched in November 2024, has been adopted by Microsoft, OpenAI, and Google, integrating with platforms like Azure AI Foundry and Windows 11 to enable AI agents to interact with external systems. Microsoft has embedded MCP in Windows 11, demonstrating enterprise-scale deployment. However, Open-source MCP servers face command injection vulnerabilities, highlighting security challenges as agent deployment scales across enterprise environments.
The Neocloud Inflection Point
Specialized cloud providers like CoreWeave, Lambda Labs, and Together AI illustrate the opportunities and strategic challenges in the rapidly evolving landscape of infrastructure specialization.
CoreWeave’s success, for instance, stems from addressing GPU scarcity through hardware arbitrage, delivering high GPU performance at lower costs via architectural optimization for single-tenant GPU utilization and predictable consumption patterns. Similarly, Lambda Labs provides cost-effective GPU cloud services tailored for AI workloads, while Together AI offers serverless access to streamline AI model training and inference.
These advantages may prove transient as market demands shift toward infrastructure emphasizing coordination and adaptability over raw computational efficiency. The core design principles underlying neocloud success—stateless applications, dedicated hardware allocation, and hub-and-spoke networking—misalign with agentic architecture requirements. Agentic systems depend on stateful interactions, decentralized processing capabilities, and low-latency peer-to-peer communication to enable autonomous agents to coordinate dynamically and share context in real-time.
Retrofitting neocloud facilities to meet these requirements could present substantial cost and complexity challenges, creating an inflection point where established business models should evolve to remain competitive. While neoclouds excel at powering current AI-driven workloads, they are not necessarily optimal solutions for agentic paradigms. Their long-term competitive position depends on successfully adapting to these emerging technological demands without abandoning the focused approaches that enabled their initial success.
Hyperscaler Structural Advantages in Architectural Transition
Neoclouds also lack the vertical integration advantages that hyperscalers possess. Custom silicon design, enterprise relationship depth, and platform ecosystem development require sustained investment and organizational capabilities that specialized providers cannot replicate without fundamental business model transformation.
Hyperscale cloud providers possess decisive structural advantages for developing agentic-native infrastructure through a combination of vertical integration, enterprise positioning, and capital deployment capabilities that smaller providers cannot replicate. Their success depends on transformation speed rather than capital scale alone, but their integration depth creates compound advantages across multiple dimensions simultaneously.
Recent earnings data illustrates both the scale of current investment and continued demand growth that justifies architectural expansion.
Microsoft's Azure revenue grew 35% year-over-year with AI services contributing 16 points of growth, processing over 100 trillion tokens during the quarter with more than 10,000 organizations using Microsoft's new agent service within four months of launch. Amazon continues experiencing supply constraints across AWS despite massive capacity additions, with management noting sustained demand growth that supports continued infrastructure investment while highlighting capacity allocation challenges that may favor providers with comprehensive platform capabilities.
Unlike neoclouds that must abandon proven business models, hyperscalers can invest in agentic-native capabilities while maintaining existing revenue streams.
Enterprise relationship leverage provides competitive protection through existing customer trust and procurement integration. Rather than requiring organizations to evaluate entirely new infrastructure platforms, hyperscalers extend established relationships into agentic capabilities through trusted channels. Microsoft's integration of MCP across multiple enterprise platforms exemplifies this approach, while Google leverages Workspace relationships and Amazon builds on AWS enterprise trust.
However, these advantages face important constraints that may limit adaptation speed. Hyperscaler scale creates organizational inertia that can slow architectural transformation, particularly when existing infrastructure investments represent substantial sunk costs. Their diverse customer bases with varying requirements may limit their ability to optimize specifically for agentic workloads without compromising performance for traditional applications.
Emergent Agentic Clouds: Purpose-Built Infrastructure Solutions
While hyperscalers possess structural advantages for comprehensive agentic infrastructure, a category of emergent agentic cloud providers demonstrates the performance improvements possible with purpose-built architectures designed from first principles to address agentic requirements rather than adapting traditional cloud infrastructure.
The architectural approach of these emergent providers addresses core agentic requirements through design principles rather than retrofitting existing platforms. Memory management optimizes specifically for persistent context storage across agent sessions. Networking infrastructure supports the low-latency agent communication patterns that coordination protocols require. Resource allocation systems enable outcome-based pricing models that align infrastructure costs with successful task completion.
Daytona Cloud represents one example of this emerging category, achieving sub-90ms sandbox creation times with native Docker compatibility specifically engineered for AI agent workflows. The platform supports stateful environments that maintain persistent context across agent interactions, addressing critical requirements that traditional cloud platforms handle poorly. Similar specialized providers are targeting specific components of the agentic infrastructure stack, including observability tools for multi-agent monitoring and optimization platforms for agentic workload management.
The emergence of these purpose-built providers highlights the performance gaps in traditional infrastructure while demonstrating substantial market demand for specialized agentic capabilities that traditional infrastructure providers cannot deliver efficiently without fundamental architectural redesign.
Economic Model Evolution
The shift toward agentic systems introduces changes in economic models that extend beyond technical architecture to encompass business model innovation. Traditional cloud economics operate on consumption-based pricing where infrastructure costs correlate with resource utilization—compute hours, storage capacity, network bandwidth—regardless of task completion success or coordination efficiency.
Agentic systems enable outcome-based pricing that aligns infrastructure costs with successful task completion rather than resource consumption alone. This transformation demands new business models from providers, with success metrics focused on task completion rates and agent coordination efficiency rather than traditional uptime or throughput measurements. It also creates opportunities for higher margins through architectures optimized for agentic workflows that deliver measurable business outcomes.
This economic transformation shapes infrastructure design priorities. Providers must optimize for task success over raw resource utilization, balancing performance, redundancy, and cost efficiency in ways that traditional cloud economics do not reward. Pricing models will favor providers who can demonstrate measurable improvements in agent coordination capabilities, fundamentally redefining competitive benchmarks for infrastructure performance in agentic environments.
Strategic Implications: Building for the Agentic Future
The agentic shift presents a fundamental challenge: building infrastructure for a technology whose full potential remains uncertain. ChatGPT emerged just 2.5 years ago, yet the industry is racing to lay tracks for a train whose destination remains unclear but continues gathering unstoppable momentum.
This momentum reflects the transformation's scale and inevitability. With 33% of enterprise software projected to incorporate agentic AI by 2028 and autonomous agents predicted to handle 15% of daily work decisions, infrastructure requirements will expand across all dimensions of agentic architecture. Success demands comprehensive capabilities spanning coordination, memory, networking, and economic models rather than optimization of traditional computational paradigms.
The capital allocation challenge compounds this complexity. Providers must simultaneously fund uncertain agentic technologies while preserving existing revenue streams, balancing innovation acceleration with operational stability during an extended transition period. Hyperscalers benefit from diversified portfolios that provide investment flexibility, yet all infrastructure providers face pressure to commit substantial resources without clear short-term returns.
Organizations that approach this transition as an incremental upgrade rather than foundational transformation risk displacement by competitors who recognize that the future belongs to platforms designed for intelligence, not merely processing power. The agentic era demands infrastructure that thinks, coordinates, and adapts to dynamic requirements while maintaining the reliability and scale that enterprise applications require.
The providers that successfully navigate this architectural transformation will establish dominant positions in the next phase of cloud computing evolution, while those that optimize for yesterday's requirements risk obsolescence in tomorrow's agentic landscape.