Gemini 3 AI Discontinuity: Decoding Google's Full-Stack Threat to OpenAI and NVIDIA

The winner won't be determined by the benchmarks but rather by who controls the full stack, from the chips to the model to the distribution.

Nov 25, 2025

Google’s Gemini 3 Pro, a frontier AI model trained solely on TPUs, outperforms competitors like ChatGPT and Claude in coding, reasoning, and multimodality benchmarks, signaling a shift from isolated model supremacy to full-stack ecosystems; this exposes OpenAI’s vulnerabilities in distribution and tokenomics amid rising Chinese open-source rivals, partly challenges NVIDIA’s hardware dominance by validating custom silicon, and redefines AI competition around orchestration, infrastructure sovereignty, and global reach in an emerging oligopoly.

In a leaked memo written a few weeks ago but widely reported just last week, OpenAI CEO Sam Altman had apparently caught wind that Google was making major progress on its artificial intelligence development.

Google’s next version of Gemini could “create some temporary economic headwinds for our company,” Altman wrote, according to The Information, adding, “I expect the vibes out there to be rough for a bit.”

His assessment proved correct.

On November 18, Google released Gemini 3 Pro. I’ll dive into some of the most critical benchmarks below in more detail and explain why they matter. In general, the new version topped the performance of GPT-5.1 and Anthropic’s Claude in several critical ways. That includes coding, and more specifically, agentic coding.

As I’ve written previously, the “coding wedge” is key to winning the enterprise market. Claude Code helped Anthropic establish an enterprise advantage, and OpenAI hoped to close that with GPT-5. Now, here comes Gemini 3 Pro, a frontier AI from a company that can bundle it into the productivity and cloud stacks already used by many developers and workforces and leverage its global consumer and enterprise distribution platforms.

With uncomfortable questions already being asked about OpenAI’s finances, news that a sleeping giant had awoken won’t help.

As it turned out, Altman and OpenAI aren’t the only ones who have something to fear from Google’s AI resurgence. Gemini 3 represents the first frontier model trained entirely on Google’s Tensor Processing Units (TPUs), not NVIDIA chips. While Google does have some partnerships with Nvidia, the company is less dependent on the chip company than many of the other members of the circular, concentrated dealmaking club that has come to define the frontier AI ecosystem.

Many of those players, including NVIDIA, are facing growing scrutiny over record CapEx spending and seemingly strong topline results over the past month. Last week, NVIDIA’s stock whipsawed after its earnings. Chip stocks are down globally. Bloomberg’s Magnificent 7 Index is down 7.6% since October 29 through November 21.

There is a notable exception to this investor skepticism: Google is up 18% over the past month.

Figure 1: Yahoo stock chart comparing 30-day stock performance of Google, NVIDIA, Meta, Amazon, Oracle, and Microsoft.

Gemini 3’s arrival is confirmation of a structural shift in terms of where AI advantage and profit will come from. The winner will be determined not by who rolls out the best model this week, but rather by who controls the full stack, from the chips to the model to the distribution. That player will be able to better manage inference costs, allowing them to embed AI everywhere. In turn, intelligence will be abundant and cheaper for customers.

This discontinuity redefines competitive dynamics by redistributing value across the stack. It exposes the fragility of OpenAI’s competitive position while proving that the infrastructure layer is far from settled.

In this article, I want to explore the implications of this critical shift.

Gemini 3’s Technical Supremacy and What It Means

Gemini 3 Pro is the strongest frontier model available today.

The good news for Google is that it didn’t need to rattle off a lot of data points to make its case. On a qualitative, user-experience level, people saw and felt the difference almost right out of the box:

Still, the data is unambiguous. Here are some of the benchmarks that I think matter the most:

76.2% on SWE-Bench Verified

54.2% on Terminal-Bench 2.0

37.5% on Humanity’s Last Exam (versus GPT-5.1’s 26.5%, Claude’s 13.7%, and Kimi K2’s 44.9%)

91.9% on GPQA Diamond (versus 88.1% and 83.4%)

81% on MMMU-Pro for multimodal understanding (versus 76.0% and 68.0%)

87.6% on Video-MMMU (versus 80.4% and 77.8%)

1501 Elo score on LMArena (the highest ever recorded)

Figure 2: Google’s benchmark testing of Gemini 3 versus other leading frontier models via its blog.

The coding performance signals the arrival of the Agentic Era at scale. Gemini 3’s 76.2% on SWE-Bench Verified (single attempt) and 54.2% on Terminal-Bench 2.0 are measurements of autonomous software engineering, and not just of code snippets. These benchmarks test whether models can independently navigate codebases, understand context across thousands of files, execute multi-step debugging workflows, and complete real-world engineering tasks without human intervention. Gemini 3’s dominance on these agentic coding benchmarks is proof that Google can deliver agents that actually work in production environments. This matters because the enterprise wedge still runs through coding, but now the question isn’t “can your model write code?” but “can your model autonomously fix bugs, refactor systems, and ship features?”

Google’s ability to embed agentic coding intelligence directly into Cloud, Workspace, and developer workflows should create compounding advantages. When the model that autonomously completes engineering tasks is also integrated into your IDE, version control, and cloud infrastructure, you are building a fundamentally different relationship where the AI becomes embedded in the engineering organization itself. This enables qualitatively different applications: holding an entire product development cycle in context, reasoning across design documents, user research, prototypes, and code simultaneously.

However, while Gemini 3 Pro leads on many no-tool benchmarks, models like Kimi K2 offer close competition in agentic scenarios, outperforming on Humanity’s Last Exam with 44.9% (using tools and thinking modes) compared to Gemini’s 37.5% baseline (but 45,8% with search and execution), underscoring the rapid evolution in open-source challengers.

Beyond coding, the PhD-level reasoning benchmarks reveal something deeper than incremental improvement. Gemini 3’s 37.5% on Humanity’s Last Exam (again) versus GPT-5.1’s 26.5% and Claude’s 13.7% represents a qualitative leap in handling the most challenging reasoning tasks humans can devise. The 91.9% performance on GPQA Diamond, which tests graduate-level science reasoning crucial for R&D in pharma and engineering, demonstrates mastery of expert-domain problems. This matters because, as models converge on 90%+ performance across standard benchmarks, the differentiator becomes reliability at the frontier of human knowledge. Gemini 3’s edge in these domains positions Google to capture the highest-value enterprise use cases, where AI deployment generates millions in R&D savings rather than incremental productivity gains. It also challenges many AI-native startups, which will find that their domain-specificity is now captured by an LLM. I call this LLM commoditization, and it is, in my view, a growing risk for many high-valued startups whose moats are eroding.

Last, multimodality. Multimodality isn’t a nice-to-have feature; it’s the prerequisite for agents that operate in real environments. Consider an AI agent managing product development: it needs to reason across user research videos, design mockups, prototype demos, customer feedback emails, code repositories, and executive strategy documents to make coherent decisions. Text-only models force workflows to bottleneck through transcription and description, losing critical information that exists only in visual or audio form. Gemini 3’s 81% on MMMU-Pro and 87.6% on Video-MMMU scores demonstrate mastery of complex multimodal reasoning that could create a structural moat against open-source text models from China, which excel at cost-efficient language processing but lag in integrated sensory reasoning.

This accelerates a global bifurcation in AI value: abundance and commoditization in text-only models, with premium pricing power accruing to full-stack multimodal orchestrators like Google, who can deploy agents that actually perceive and reason about the world as humans do.

Beyond raw capability across these three dimensions, the critical differentiator becomes deployment at scale. Google’s day-one integration into Search (AI Mode), Gemini app (650M users), and Workspace embeds intelligence where billions already work.

Combine that with the orchestration advantage, which enables ecosystem monetization beyond API revenue: subscriptions (AI Ultra), Workspace adoption, search engagement, and enterprise licensing. Also, as model capability and intelligence commoditize faster than predicted, particularly in the face of surging competition from Chinese open-source rivals, orchestration is where the real durable growth moats will be built in the Agentic Era.

The product isn’t the model. The product is an intelligence layer enabling an ecosystem.

What Gemini 3 Means for LLMs and OpenAI

Gemini 3 marks a structural inflection point that redistributes competitive advantage across the LLM landscape. The implications extend beyond any single company, but OpenAI faces the most immediate strategic pressure.

Let’s start by breaking down the basic tokenomics.

Gemini 3 costs $2/$12 per million tokens. Claude: $3/$15. GPT-5.1: $1.25/$10.

But cost-per-token misses the point. What matters is cost-per-outcome. A model solving problems correctly in one pass at $12 beats one requiring three attempts at $10. That’s just the start. Tokenomics are converging toward commodity status. As I documented with Kimi K2, a Chinese open-source competitor achieved frontier capabilities at $4.6M training cost. DeepSeek R1 demonstrated similar economics. Gemini 3 proves even proprietary models deliver SOTA without astronomical costs.

For OpenAI specifically, this shift exposes three vulnerabilities:

Technical supremacy gone as a differentiator.

Consumer and enterprise adoption face Google’s distribution. ChatGPT’s 800M weekly users are impressive, but Gemini’s 650M monthly users from a standing start demonstrate Google’s power. More critically, Gemini’s embedding across Search (2B users of AI Overviews in Google Search per month), Android (>3B devices), and Workspace (3B+ users) creates ambient availability that ChatGPT can’t match. When AI is embedded in tools people already use, standalone apps face adoption battles as embedded ecosystems drive higher retention and lower acquisition costs.

The “unbundling-rebundling” of applications (both consumer and enterprise) strategy requires time that OpenAI may not have. As I had developed upon the launch of OpenAI Jobs, I refer to unbundling as deconstructing traditional workflows (e.g., resume building, job searching, and recruiting on platforms like LinkedIn when applied to recruitment) into discrete, AI-optimized tasks. Re-bundling then integrates these into a conversational AI ecosystem (e.g., via ChatGPT as a “super-assistant”) that orchestrates complex activities across domains. It is my conviction that this is the only way for OpenAI to build a durable growth moat.

The market hasn’t fully processed the implications yet, but the signals are accumulating and they’re increasingly difficult to ignore. The GPT-5.1 release, which should have been a triumphant moment, garnered less excitement than previous launches. It didn’t help that it arrived between Kimi 2 Thinking and Gemini 3.

The narrative velocity has shifted, and the perception change matters enormously. Two years ago, industry analysis framed the race as “Who will beat OpenAI?” Now it’s “How will OpenAI respond to Google?” That framing shift compresses valuations and makes fundraising harder. Markets reward momentum and leadership; once the narrative shifts to “defending position,” expectations adjust downward.

This doesn’t mean OpenAI is finished. The company retains formidable advantages: world-class engineering talent, significant capital resources, strong brand recognition, and valuable consumer momentum through ChatGPT. Multiple viable paths forward exist. The question is whether any of them support a $500 billion valuation – and a projected $1T valuation for IPO! - in a world where technical moats erode faster than new ones can be built.

The broader pattern matters more than any single company’s fate. Gemini 3 validates that frontier AI capabilities no longer require any single company’s infrastructure, training approach, or data advantages. When Google, Chinese labs, and Anthropic can all produce competitive frontier models through different paths - custom silicon, open-source innovation, focused enterprise deployment - the industry structure shifts from potential monopoly to inevitable oligopoly.

What This Means for Hardware and the Infrastructure Layer

Gemini 3 creates a paradox for NVIDIA.

Google’s Gemini 3 is the first state-of-the-art model trained entirely on non-NVIDIA hardware. Google’s Tensor Processing Units (TPUs) delivered frontier performance without a single H100 in the training stack. This validates that alternatives exist and proves scaling laws remain intact: more compute yields better models.

More compute: Good for NVIDIA. OpenAI, Anthropic, xAI, Meta, and enterprise buyers are all increasing budgets. This triggers a capex race among hyperscalers.

More competition: Bad for NVIDIA. All of these big frontier players would love to reduce their dependency on NVIDIA. Google has shown them it can be done. If Google achieves SOTA on TPUs, Microsoft, Amazon, and others see a pathway to vertical integration. Gemini 3 proves alternatives (TPUs) can deliver frontier performance, validating custom silicon investments by hyperscalers.

Meanwhile, this gives Google a chance to nibble away at NVIDIA’s business. For instance, earlier this year, Anthropic struck a deal for “tens of billions of dollars” for Google’s TPU AI chips.

For now, NVIDIA continues to report blistering demand for its AI chips. Last week, the company said Q3 2025 revenue grew 62% to $57 billion compared to the same period one year ago, and jumped 8% from the prior quarter. It is forecasting revenue to increase 65.4% YoY for the current quarter.

However, the risk is that over the medium to long-term, structural dynamics may shift as custom silicon improves and pricing power erodes. Until then, Blackwell has immense runway.

Now let’s turn this lens towards Google to see what this full-stack advantage delivers. Along with some of the other moats and value-add services I’ve already mentioned, there are some other critical layers at work here.

The first opportunity is inference.

Beyond the TPUs themselves, Google has built the JAX AI stack, an open-source, compiler-first framework spanning the entire ML lifecycle from training to inference.

Figure 3: Jax AI stack and ecosystem components

This matters for two reasons.

First, economics: when you control the silicon, compiler, and serving infrastructure, inference costs can be an estimated 20-30% cheaper than competitors relying on NVIDIA chips and third-party infrastructure. Training is one-time capex; inference is daily opex that scales with usage. As AI reaches trillions of daily inferences, this cost advantage compounds exponentially.

Second, adaptability: unlike PyTorch’s reliance on hand-optimized CUDA kernels that need rewriting for new architectures, JAX’s compiler-first design generalizes to novel patterns automatically. Google’s strategy - open-source the development stack (JAX, MaxText, vLLM-TPU) to accelerate ecosystem adoption, then capture economics through infrastructure deployment - creates a virtuous cycle. Every model trained on JAX increases demand for TPU inference. Chinese labs targeting JAX compatibility demonstrate this dynamics.

The pattern is clear: open innovation in model development, proprietary capture in infrastructure deployment. OpenAI and Anthropic, tied to NVIDIA infrastructure and closed models, cannot easily replicate this simultaneous cost leadership and ecosystem participation.

Conclusion: Discontinuity Defined

However this does play out eventually, in the short term this kind of upheaval is healthy.

This sudden reversal of fortune validates that the AI race will be oligopolistic and not a monopoly, though it won’t be crowded either. Oligopoly drives continuous innovation while preserving economics for R&D. Seven to ten companies globally can sustain this.

Winners need infrastructure sovereignty (Google’s TPUs and custom silicon, providing cost advantages), orchestration depth (coordinating complex workflows), or distribution reach (Google’s Search/Android/Workspace embedding). Companies with two or three advantages will capture disproportionate value.

Gemini 3 delivered three important discontinuity proof points that will likely reveal in the coming months how competitive dynamics are being reshaped:

Expanding the proliferation of AI across use cases and geographies. Google’s infrastructure enables simultaneous worldwide deployment, compounding its distribution advantages. In addition, the native multimodal capability, combined with pricing, should enable a new breed of use cases and applications.

Redefining what matters in AI competition. Model performance remains important but insufficient for sustainable advantage. Orchestration quality now determines outcomes. Can the system reliably coordinate multi-step workflows without drift? Does it maintain context across hundreds of tool calls? Distribution reach compounds these advantages.

Upending the infrastructure layer where public markets have the most at stake. Gemini 3 validates that scaling laws remain intact, proving that more compute yields better models. NVIDIA’s near-term prospects look exceptional. But training Gemini 3 on TPUs rather than NVIDIA GPUs validates vertical integration strategies across the industry. More importantly, it shifts the cost curve for inference, the far larger and more durable revenue pool.

In terms of that LLM “oligarchy,” it looks quite fluid compared to the conventional wisdom six months ago:

Google will be in that oligopoly thanks to its combination of model quality, custom silicon, global infrastructure, and embedded distribution.

OpenAI’s position is less certain than many had assumed.

Anthropic occupies an interesting middle ground, lacking Google’s distribution but having established enterprise trust and through MCP adoption by competitors transforms a technical standard into a strategic asset.

Chinese labs such as Moonshot, DeepSeek, Qwen, and others demonstrate that frontier capabilities no longer require American capital or NVIDIA chips. Their open-source releases accelerate commoditization while creating strategic opportunities for cloud providers who can optimize inference deployment.

The future isn’t written, but Gemini 3 just made several scenarios far more probable and has forced us to re-evaluate which companies are positioned for them. Those who recognize this discontinuity early, position accordingly, and build for Act II’s competitive dynamics will capture asymmetric returns. Those who don’t will discover that yesterday’s moats have become today’s liabilities.

Decoding Discontinuity

Discussion about this post