DeepSeek: GenAI’s Punctuated Equilibrium?
A special edition of this newsletter with a proposed framework for understanding DeepSeek's recent breakthroughs and its business model impact.
The theory of “punctuated equilibrium,” popularized by evolutionary biologist Stephen Jay Gould, theorizes that long periods of stability are occasionally interrupted by rapid change. It essentially suggests that significant evolutionary leaps happen quickly rather than steadily accumulating over time.
As we witness the aftermath of DeepSeek's recent breakthrough in generative artificial intelligence (“genAI”), this biological principle offers a compelling framework for understanding the seismic shifts in the technology sector.
What just happened?
Chinese entrepreneur Liang Wenfeng founded DeepSeek in 2023. Liang, who studied electronic information engineering at Zhejiang University, co-founded the quantitative hedge fund High-Flyer in 2015, where he applied artificial intelligence to financial trading. Leveraging his expertise in AI and finance, Liang established DeepSeek to advance artificial intelligence research and development.
The company embraced open source and a different approach to building models that focused on software-driven efficiency rather than relying on massive computational resources. The approach challenged the entire economic calculus of building LLMs.
So, the release on Friday of DeepSeek-R1, which seems to match the performance of ChatGPT by some metrics and is potentially 20 to 50 times cheaper to use than OpenAI’s o1 model depending on tasks, sent shockwaves through markets, triggering significant value destruction for companies that had been riding the GenAI boom.
The tech-heavy Nasdaq fell 607.47 points, and a major semiconductor stock index had its biggest drop in four years yesterday. Nvidia (-17%) lost $600bn in market cap, the largest one-day loss in market history, while Broadcom (-17%) lost $200bn in value.
Understanding the DeepSeek Discontinuity
That brings us back to punctuated equilibrium. The past year has marked extraordinary progress in AI reasoning capabilities, an evolutionary pace that is almost too rapid to fully comprehend.
Google DeepMind's Gemini 1.5, unveiled in February 2024, demonstrated remarkable advances in multi-step reasoning and task composition. Anthropic's Claude 3 Opus followed in March, setting new benchmarks in complex analytical tasks and abstract reasoning. Then came OpenAI's Strawberry model in September, released as "OpenAI o1," which pushed the boundaries of reasoning capabilities even further. While these developments sparked renewed speculation about artificial general intelligence (AGI), they remained firmly in the realm of narrow AI, albeit with increasingly sophisticated capabilities.
All three relied on massive model architectures and extensive computational resources. This created a soaring market for companies that provided the infrastructure and seemed to create a massive barrier for anyone seeking to compete. The capstone of this outlook came with the announcement of the $500bn Stargate project led by OpenAI, a gargantuan figure that promised to create an unassailable moat around the compute infrastructure needed to compete.
And yet, just a couple of days later, DeepSeek appeared to have breached that moat.
DeepSeek upended the prevailing investment paradigm by demonstrating that smaller models can achieve sophisticated reasoning through advanced reinforcement learning techniques, without the need for extensive unsupervised pre-training. This latter technique is precisely the part where you need huge levels of costly GPUs.
DeepSeek's achievement represents a fundamental rethinking of this costly approach which is likely to put a significant dent in token prices again. This is important as API tokens represent the primary form of monetization of LLM capabilities beyond subscriptions.
The contrast not only lays bare a fundamental technical question about AI development but also highlights the increasingly complex interplay between technological advancement and great power competition.
Implications for the AI Industry and AI Enterprise Adoption
The implications for the AI industry are profound, potentially challenging the $157 billion valuation recently placed on OpenAI. We had already seen token prices collapse on pre-reasoning models before a massive uptick with Strawberry. Still, there has been an assumption that token prices would eventually become commoditized. The conventional wisdom that token prices would steadily decline has now been accelerated, reinforcing that technological advantages in language models are temporary. The real competitive moat lies in integration with client workflows and systems.
That could be great news for organizations that can now contemplate breaking free from closed API providers - a shift that could upend the current power dynamics in AI deployment. They will now need to rethink their technological choices.
With models capable of reasoning being smaller and open source, the barriers to putting AI into production at the enterprise level strongly diminish. For these companies, the calculus for deployment changes and open source will likely gain traction as a legitimate alternative.
However, successful production deployment still requires robust MLOps infrastructure, security protocols, and technical expertise.
From a data security standpoint, there should not be major concerns that because the model is Chinese there is a direct risk of China collecting any data. As with any open-source model, data security will depend on where and how the model is deployed, what security measures are in place, how the infrastructure is configured, and whether the model makes external calls.
Winners and losers: assessing the value dispersion at stake
The DeepSeek eruption could reshape the competitive landscape across multiple layers of the technology stack. Let’s examine how.
The timing is striking because DeepSeek's announcement comes just weeks after OpenAI's record fundraising discussions, highlighting the volatile nature of technological moats in this sector at the foundation level. At the foundation model level, established players like OpenAI and Anthropic face new competition, while open-source companies like Mistral AI must contend with a rival that combines accessibility with advanced reasoning capabilities.
The concentrated power of API providers like OpenAI, and Anthropic, could be dispersed, while open-source solutions boom. As HuggingFace CEO Clement Delangue noted, the platform has seen 500 derivative models of DeepSeek-R1 in just a few days and 2.5 million downloads.
This shift particularly benefits middle-layer companies focused on MLOps and AI deployment. Firms like Databricks are well-positioned as organizations increasingly move AI into production environments. The democratization of powerful AI models will only accelerate the need for robust infrastructure to manage and optimize these systems. The ability to deploy more efficient models could also lead to new approaches in model optimization and deployment strategies, potentially creating opportunities for specialized MLOps tools focused on maximizing the performance of these smaller, more focused models.
For infrastructure providers, the picture is more nuanced. While the distribution of GPU purchases may become more dispersed - from large API providers to enterprise customers - and fall in volume, this represents an opportunity for Nvidia to diversify its customer base and expand its inference solutions. It could perhaps even open new revenue streams whose quantum have yet to be determined.
The shock is immense, but if Nvidia plays offense in this great infrastructure value migration, it could have strong arguments. Indeed, the company tried to make that argument in a public statement issued after markets closed on Monday. The shift toward more efficient models could also accelerate the development of specialized AI hardware optimized for these new architectures. Cloud providers like AWS and Azure, once they weather the current storm of infrastructure oversupply, could benefit from serving a more distributed model of AI deployment. In the long term though, commoditization of chips should prevail.
Importantly, this evolution could yield environmental benefits. Smaller, more efficient models require less energy than their frontier model counterparts, potentially reducing the AI industry's carbon footprint sooner than anticipated. Initial estimates suggest that these more efficient architectures could reduce energy consumption by an order of magnitude compared to traditional large language models while maintaining similar levels of performance. Though the level of energy spend is challenged in absolute terms, this creates opportunities for data centers and energy providers who can adapt to serve a more distributed infrastructure model.
Traditional software companies, whose value lies in business applications rather than AI capabilities per se, may find their positions largely unchanged or even strengthened. The availability of more efficient AI models provides another tool for enhancing their products without fundamentally altering their business models. The reduced infrastructure requirements could also accelerate the integration of AI capabilities into existing software products, potentially leading to more rapid innovation in applied AI solutions.
Looking forward
DeepSeek's breakthrough warrants both excitement and scrutiny. As smaller models achieve advanced reasoning, the barriers to developing powerful AI systems will lower significantly - if the efficiency claims hold true. While early benchmarks are promising, the AI community still awaits peer verification of DeepSeek's results, particularly around the model's performance consistency and resource usage claims.
Yet the regulatory implications are already clear: frameworks designed to monitor a handful of large players with massive compute resources may need rapid revision. More distributed AI development could make oversight more challenging, but potentially less urgent - smaller models might be inherently more controllable and auditable than their massive counterparts. The key is finding meaningful measures of AI capability and risk.
Lessons learned
This moment of punctuated equilibrium in AI's evolution offers a crucial lesson about efficiency as a cornerstone of sustainable growth in disruptive technology. Resource constraints, while challenging, often drive innovation and force companies to make difficult but necessary decisions about allocation. The early success of DeepSeek's smaller, more efficient models demonstrates that bigger isn't always better – a timely reminder as the industry navigates the early stages of the generative AI revolution.
As we witness this Discontinuity in the technology landscape, the winners will not necessarily be those with the largest budgets or most extensive infrastructure. Instead, success will likely flow to those who can most effectively integrate these more efficient AI capabilities into solutions that deliver real value to customers.
This Darwinian moment in AI's evolution may well set the stage for the next phase of growth in the industry – one built on the foundation of efficiency rather than scale alone.