Claude Is Building Claude: Where Scarcity Migrates When the Intelligence Factory Goes Autonomous
Anthropic disclosed that it is approaching “recursive self-improvement” to develop its models. The real story is what it means for scarcity and the moat as the company prepares to go public.

Anthropic’s Recursive Self-Improvement (RSI) research (i.e. AI autonomously designing and improving its own successors) isn’t about “superintelligence” or a “pause.” It is the seventh “tremor” in the shift from tools to goal-seeking agents. Intelligence and the Harness (routing, memory, orchestration code) are commoditizing fast and becoming self-improving, but real scarcity, which defines durable value and moats, migrates to the two true bottlenecks: scarce compute (which RSI actually intensifies) and operational context (the real-world data, outcomes, and judgment that labs can’t automatically generate or own). RSI accelerates everything except the parts enterprises control, giving Anthropic’s enterprise/Harness lead a powerful booster. But it’s not an unassailable moat yet.
Even with the historic SpaceX IPO scheduled for later this week and rival OpenAI confidentially filing for its own public offering, Anthropic once again managed to hijack the public discussion about artificial intelligence. As is often the case in these early days of the Agentic Era, I think that the public conversation has tended to focus on the more immediate and dramatic aspects, while the deeper economic significance has received less attention.
In this case, the conversation stopper came not in the form of a new product, but rather a research paper published by Anthropic’s research arm: “When AI builds itself.“ The paper represented its most detailed public account yet of how it is developing a technique known as “recursive self-improvement” (RSI), which it describes as “an AI system capable of fully autonomously designing and developing its own successor.” The publication landed weeks after Anthropic scored a recruiting coup by hiring legendary AI researcher Andrej Karpathy, a founding member of OpenAI and formerly of Tesla, who recently had been working on his own RSI project, to build a team to accelerate the pre-training of the next Claude.
Because the Anthropic paper discussed the “implications” of such a system, the world focused almost solely on a single conversation:
It seemed astonishing that the company would call for such a thing, considering that on June 1, Anthropic confirmed it had confidentially submitted a draft S-1 that reportedly included an $47 billion ARR and had announced a $965 billion funding round.
So, was this a savvy PR exercise at a time of intense competition for attention, to steal the spotlight from the $1.75 trillion SpaceX IPO? Was this a branding exercise to reinforce Anthropic’s image as the good governance LLM?
Of course, the first problem is that Anthropic wasn’t really calling for a pause. But I will leave it to Gary Marcus to explain why.
However, from my perspective, the larger implication of this announcement flows back to the framework of Orchestration Economics that I have been building over the past three years. In the Orchestration Economics Manifesto, I define generative and agentic AI as Discontinuity, a geology that has been built as a sequence of six shocks along a fault line, each amplifying the others until the boundary gave way and created a structural break as machines crossed from being tools that process instructions to actors that pursue goals.
RSI represents the seventh tremor in that continuum. Again, the instinct is to view RSI as a capability story, perhaps a step toward superintelligence or AGI. This also misses the point. As with the other six tremors, the real question is what happens to value and moats when the production of intelligence itself becomes recursive.
Every major shift in economic history has been accompanied by a migration of scarcity. When a factor becomes abundant, value moves elsewhere. If recursive self-improvement works, even partially, it does not merely accelerate intelligence. It accelerates the search for whatever intelligence cannot produce.
That is why Anthropic’s announcement matters. Not because Claude may someday build Claude, but because it gives us a glimpse of where scarcity goes next and where value and moats are migrating: Further away from the Model layer, more firmly in the Harness, and even closer towards the Orchestration layer. And, naturally, to compute.
Recursive self-improvement as a tremor in the Discontinuity
Let’s start with two important bits of context before I dive back into Claude and RSI
Between September 2024 and February 2026, I counted six tremors: intelligence arriving, intelligence becoming cheap, silicon independence, protocol standardization, the inference swarm, and the long-context frontier. They arrived separately and reinforced one another into a single structural break.

Recursive self-improvement is the seventh. And yet, its character is different.
The earlier shocks expanded the supply of intelligence, made it cheaper, or taught it to coordinate. RSI changes the machine that produces the intelligence in the first place, the engine of the whole sequence. If it starts building better versions of itself, everything compounds.
Anthropic is careful to frame this as a fork, not a fait accompli, and sketches three scenarios:
Scenario 1: The trend stalls. The gains plateau and capability diffuses without a runaway.
Scenario 2: The one we are visibly in, the efficiency gains compound but humans still set the direction, choosing which problems matter and judging the results.
Scenario 3: The loop closes. AI designs and trains its own successor, end to end, and the pace of progress becomes bounded only by the availability of compute.
The real question is the distance between the second and third scenarios. Because that when the human stops being the one who decides.. The seventh tremor is the crossing toward the third.
The Institute does not ask us to take this progress on faith. The numbers are disclosed. As of May 2026, more than 80% of the code Anthropic merges is written by Claude. That’s up from low single digits before Claude Code shipped in February 2025. Anthropic leadership has put the looser figure north of 90%. The typical engineer merges roughly eight times as much code per day as in 2024. In April, Claude shipped over 800 fixes that cut one class of API errors a thousandfold, work the supervising engineer estimated would have taken a human four years.
Lines of code are a crude proxy, and Anthropic says so itself. The median engineer self-reports only ~4× uplift, and Anthropic cites METR’s finding that developers overestimate AI productivity gains. A competing arXiv analysis (2602.04836) argues the METR curve may not be cleanly exponential. None of this changes the direction - code authorship >80%, quality at parity and rising- but the argument rests on the direction and the mechanism, not on any single headline multiple.
Now let me add the second bit of context: the Three Rings.
If those six tremors turned machines into actors, the central question of the Agentic Era is: Who directs them? The Agentic Enterprise is organized around three rings: Intelligence, the Harness, and the Orchestration Layer.
When applying Orchestration Economics, the job is to determine where a company sits within the Three Rings and if it is capable of moving further out to capture more value, because that is the whole game. Viewed through this lens, Anthropic’s SRI disclosure tells us the following:

The most obvious leap is in Ring 1: Intelligence, the cognitive substrate. Hand Claude some model-training code and ask it to make it faster while passing the same checks. In May 2025, Opus 4 averaged a 3x speedup. By April 2026, Mythos Preview was hitting 52x in four to eight hours, compared to the 4x a skilled human needed in the same time. On that narrow, verifiable task, the model went from helpful to superhuman in under a year. Measured from the outside, METR finds the task length a model can complete unsupervised, now doubling roughly every four months, up from seven. Opus 4.6 handles twelve-hour tasks, and Mythos Preview reportedly ran for sixteen.
The more interesting leap is in Ring 2: The Harness. This is the routing engine that decides where every unit of work flows. The Harness decomposes a goal, spawns sub-agents, manages state, retrieves memory, and reviews output. The substrate, the intelligence, is commoditizing. The model is not the moat. The labs know this, which is why they are no longer primarily focused on building chat interfaces. Instead, they are racing to build the best Harness.
RSI is what lets them build it. The Harness is code, the one substance RSI has proven it can improve. A self-improving model can autonomously write, test, and optimize its own integration script, its own pointers, the connectors that wire it into every tool, system, and sub-agent, grading each version against a clean metric: did the task complete, faster, and more coherently?
When 80% of Anthropic’s merged code is Claude’s, much of it is not the model. You do not write a model in a pull request. It is the Harness Claude Code built by Claude.
The model is becoming the engine that wires itself into everything and then optimizes the wiring. That is what it means to become the central routing engine of the orchestration graph: not a smarter chatbot, but the node that decides where work goes and is increasingly written by the thing it routes.
We saw how deep this runs in late March 2026 when Anthropic accidentally exposed the source code of Claude Code, some 1,900 files, and a full agent runtime. This revealed the orchestration thesis rendered in code. Its most telling component is the pointer system: a small, always-loaded index governing what the model loads, fetches, or forgets across a long task, routing-and-memory logic, a self-improving model can rewrite against its own metrics by lunchtime.
That is the agent improving the scaffolding that runs the agent, and it is how the lab compounds its central role in the orchestration graph.
Claude is not only helping improve future versions of Claude. It is also helping improve the software that surrounds Claude: the routing systems, memory layers, evaluation frameworks, and orchestration logic that determine how intelligence is deployed.
These are two different forms of recursive self-improvement, and they operate on very different clocks.
Improving the model itself requires new training runs, new compute, and months of iteration. Improving the orchestration layer requires only code, tests, and evaluation. One loop is slow. The other can run continuously. This gives RSI two loops at different speeds.
The inner loop of the model-improving model is gated by compute and training runs. These are slow, expensive, and episodic. The outer loop of model improving the Harness is gated only by code and evaluation. These are fast, cheap, continuous, and ship between releases without a pre-training. The outer wheel turns faster, and it is where the lab banks a position, because the model commoditizes.
Ring 1 is converging across five vendors, while the routing engine is proprietary. A model that improves the system that coordinates models is the most valuable self-improvement the labs have, and the least visible on a leaderboard.
However, recursive self-improvement only works where success can be measured automatically.
This distinction is crucial because it means the important boundary is not between the model and the orchestration layer. RSI can improve both. A model can evaluate whether code runs faster. It can evaluate whether an agent completed a task. It can evaluate whether memory retrieval has improved.
The boundary sits between the orchestration layer and operational reality.
Ring 3 is where we find operational context, which has no impartial scorer. Whether a decision was right lives in the world, arrives on the slow clock of realized outcomes, and is owned by whoever runs the operation. The machine-turned-actor cannot retrospectively evaluate whether an insurance policy was correctly underwritten, whether a supply chain decision was optimal, or whether a loan should have been approved.
Those outcomes are determined in the real world. There are no clean, verifiable metrics to be optimized. One of the keys to Anthropic’s rapid progress has been understanding that these loops need such metrics, but they also need to run in narrowly defined boxes to allow for that feedback. Not only does Ring 3 not lend itself to tidy metrics, but it also doesn’t fit neatly into such a box.
What RSI changes and where we are
Before considering the economic implications, it is worth being precise about what RSI does and what it does not.
First, RSI is not AGI or superintelligence. It is the engine you would use to reach either. It is a claim about the rate of development. For instance, how much of the AI R&D loop can be delegated to AI? It is not about the resulting system being generally intelligent or conscious. You can believe in compounding RSI while staying agnostic about AGI timelines. I do.
Second, it is not new. Google DeepMind’s AlphaEvolve, announced May 2025, was a genuine precursor, an evolutionary coding agent pairing Gemini with automated evaluators that edged past Strassen’s 1969 matrix-multiplication record, clawed back ~0.7% of Google’s worldwide compute, and improved the training of the very models underneath it. AI improving its own substrate is not a 2026 invention.
Third, the tidy “phases” framing presented by Anthropic in the paper is misleading.
Anthropic presents recursive self-improvement as the next stage in a progression from coding agents to orchestration and then to AI systems capable of improving their own successors. The framework is useful, but the boundaries are less clean than the diagram suggests.
First, coding agents are real. This is no longer speculative. The evidence is now overwhelming that frontier models can autonomously write, debug, test, and improve code across increasingly long horizons. Claude Code, OpenAI’s coding systems, and Google’s AlphaEvolve all demonstrate that AI can already contribute meaningfully to software development. We are not waiting for coding agents. We are living through them.
Second, autonomous agents are underway but unfinished. The industry’s real race is no longer to build a better chatbot but to build the Harness: the routing, memory, agent coordination, and workflow infrastructure that turns intelligence into action. This is the transition I described in the Orchestration Economics Manifesto. We are clearly moving into this phase, but the architecture is still being contested. Every frontier lab is building its own orchestration layer. The protocols are still evolving. The center of the orchestration graph has not yet been claimed.
Third, recursive self-improvement lies further along the same continuum. The evidence is increasingly compelling that AI can improve parts of the systems around it and, under tightly controlled conditions, improve components of the AI development process itself. But this remains the least mature of the three stages. Much of the evidence comes from coding environments and tasks with clear evaluators. The direction is real. The degree remains uncertain.
This is why I think the notion of neatly separated phases is misleading. Coding agents, orchestration, and recursive self-improvement are not successive rooms in a hallway. They overlap. We are already inside the first, actively building the second, and beginning to glimpse the third.
We should also not confuse demonstrations with deployments. The lab’s striking result, which found agents recovering 97% of an alignment-research gap that two humans recovered only 23% of by designing every experiment themselves, is a research artifact with caveats. It did not transfer to production-scale models. And humans chose the problem and wrote the rubric.
Lessons from Anthropic’s Project Vend last year offer the corrective. Claude, running an actual store as “Claudius,” and with no context or guardrails, lost money, sold tungsten cubes at a loss, and at one point insisted it was a human in a blazer. “We would not hire Claudius,” the researchers wrote. Drilled on procedures, it later ran respectably, until fresh red-teamers fed it falsified documents and walked it into debt.
The capability and deployment frontiers are not the same line.
The true costs of RSI
This brings us back to the question of what RSI potentially does for the labs’ economics.
The framing that RSI slashes R&D cost is only half right. Yes, RSI compresses the human labor share of research, such as salaries and engineer-hours. But the dominant line in a frontier lab’s R&D budget was never salaries. It is compute. RSI does not cut the compute bill. It re-denominates the entire cost base onto compute, and then it makes that bill bigger. The 97% experiment cost $18,000 in compute precisely because the human time fell away. The unit of production has changed from a paid researcher to a running GPU.
Making a researcher almost free does not reduce compute demand. It multiplies it. When an experiment is bottlenecked by scarce human judgment, you run a few at a time. When judgment is cheap and parallelizable, you run hundreds, each consuming compute, each competing for the same clusters.
The Institute is explicit in its third scenario. If the loop closes, “the pace of progress becomes determined entirely by the availability of compute.” So, RSI does not relieve the labs’ compute hunger. It intensifies it. A self-improving lab is a more compute-constrained lab, not a less constrained one. That is why the same companies publishing RSI papers are signing the largest power-and-silicon contracts in corporate history. The unit of value moved, and it moved to the one input that only a handful of entities on earth can secure at scale.
The change that genuinely matters, though, is not in the cost structure. It is in the position of the human. And this represents a major consequence.
Until now, the human sat at the center of the development loop: deciding, designing, executing, and judging. RSI moves the human to the edge, from operator to reviewer, and then toward observer. Doing now costs almost nothing in human time. What remains is taste, judgment, and direction-setting, and even that is eroding. On which next step to take in a stuck research session, Opus 4.5 beat the human choice 51% of the time last November. By April, Mythos Preview was at 64%. Anthropic states the trajectory plainly: once human and AI code reach parity, humans stop writing and only review. If they cannot review as fast as Claude generates, the review itself becomes the bottleneck. And then it is dropped.
Houston, we have a problem.
When the system that builds the system runs faster than the humans nominally supervising it, human review stops being a control and becomes a ceremony. Recursive self-improvement potentially becomes recursive error and harm propagation, flaws no human inspected, compounding into the next generation, the unreviewed one helped build. We are, for the first time, mass-producing the producer and removing the human from its quality control.
That is an audit, liability, and governance problem. And it sits under every enterprise that will deploy these systems into regulated work. The Institute, to its credit, says this aloud, ties it to alignment, and floats a verifiable slowdown. This was one of the “implications” that the institute raised in its paper. And while it’s certainly a legitimate one, the sci-fi scary AI nature of it was naturally catnip for mainstream media conversations.
What does that mean in the AI race?
Strip away the drama, though, and RSI confirms the core thesis of Orchestration Economics. It reinforces the race to two scarcities and even accelerates both.
The first is the operational context. RSI accelerates abundant intelligence and perfects the Harness that channels it. Abundance is not value. When a factor of production is commoditized, value migrates to whatever the abundant thing cannot produce itself. RSI cannot produce Ring 3: the insurer’s ten million adjudicated claims, the carrier’s billion routed shipments, the bank’s trillion dollars of underwriting. That has no automatic scorer, because the scorer is the world. The faster the core commoditizes, the more obviously the durable surplus sits in context and proximity to intent.
The second scarcity is compute, and we have just seen why RSI tightens rather than loosens it. Anthropic has, in roughly a year, assembled commitments from Amazon (up to 5 GW), Google and Broadcom (multiple gigawatts from 2027), Microsoft and Nvidia ($30bn of Azure capacity), and recently taken over the Colossus 1 data center in Memphis, now with Google! The orchestration leader is renting the scarcest asset of the orchestration laggard. Compute the flows toward the company that earned the upstream position.
Beyond both bottlenecks, RSI also accelerates deployment. A self-improving Harness gets better, autonomously, at the two things that have always slowed enterprise AI: mapping a customer’s systems and writing integrations. Automated context mapping and dynamic integration mean an agent can survey an enterprise’s schemas, tools, and workflows and wire itself in days rather than quarters. But it does not lower the cost of connecting to the operational context, nor does it transfer ownership of it. The agent lays the pipe faster. The enterprise still owns the water, and faster integration only deepens its dependence on the routing layer controlled by the labs. Deployment accelerates; the moat does not move.
And for Anthropic?
This is the question the IPO will turn on.
First, it is a well-timed piece of communication: unreported internal numbers, published into an IPO race weeks after the Karpathy hire, telling the market the engine compounds at the moment it is being priced. The research is real and the candor genuine. A clear-eyed reader holds both thoughts: it is both true and calculated in terms of timing.
Second, and more important: RSI is not a moat. Every frontier lab has the loop: AlphaEvolve proves Google’s; OpenAI runs its own. A capability your two largest competitors also have is table stakes, not a moat. RSI is a moat amplifier. It compounds whatever position you already hold by pouring its fastest, cheapest gains into Ring 2.
The real question is whether the position RSI amplifies is defensible. The answer is a qualified, conditional yes. Anthropic’s position is the enterprise: roughly 40% share by the trackers, eight of the Fortune 10 as customers, a coding wedge that became the orchestration harness at the center of the routing graph. RSI compounds that lead on the fast loop, and it is a better moat than the model, because the model commoditizes, and the routing engine, for now, does not.
But “for now” it is doing real work. The Harness is contestable. Every lab is building one. MCP is open and donated to the Linux Foundation. Owning the center of the orchestration graph is a land-grab still in progress. RSI raises the stakes in that land grab. It does not win it on its own.
The one layer that would be an unassailable moat is Ring 3. But that demands an operational context that RSI structurally cannot deliver to a lab for the same reason it cannot improve it. There is no scorer that the machine can run. Even maximal RSI leaves the deepest scarcity with the enterprise that generated it.
So, RSI is an accelerant for Anthropic on a real but contested advantage at the routing layer, and a reminder that the durable surplus sits one ring beyond the lab’s current reach. The moat question reduces to a race Anthropic has not yet won: can it convert a harness lead into ownership of the orchestration graph’s center before the model commoditizes and enterprises wall off their own context? RSI buys speed in that race; it does not end it.
There is a tell in the document itself: Anthropic foregrounds oversight, governance, and the option of a verifiable pause precisely because its buyer is the regulated enterprise, where those are procurement requirements. That is as much a positioning statement as a research note. But positioning is not a moat either.
Has anything fundamental changed, then? Not for anyone reading the fault line rather than the headlines. RSI is the seventh tremor, not a new earthquake. It accelerates the commoditization of abundant intelligence. It perfects the Harness, which was always contestable. It shifts the binding constraint onto compute. And it leaves the durable surplus where the Manifesto put it: in the operational context of the outermost ring, the one ring RSI cannot reach, because it is the one layer with no in-silico scorer.
The machine is starting to build the machine. That is genuinely new, genuinely unsettling, and I will not pretend otherwise, least of all when it comes to the human stepping out of the loop. Machines are becoming actors faster than ever. But ultimately, the real value is still captured by whoever directs them and owns the context that tells them what “correct” means.
For now, at least, those are also humanity’s moats.
The views and opinions expressed in this Website are those of the author alone and are based on publicly available information. The expressed views and opinions do not constitute investment advice, a solicitation, or a recommendation to buy or sell any security or financial instrument.
The author may hold positions in the securities of companies mentioned. Certain companies referenced may be current or former clients of, or counterparties to, the author or affiliated entities; such relationships will be disclosed where applicable.
Past performance is not indicative of future results. To the fullest extent permitted by applicable law, the author does not accept any liability for any loss or damage arising from reliance on this content. Readers should conduct their own independent due diligence and consult a qualified financial advisor before making any investment decision.



