Memento: The Memory Architecture Ushering in AI's Agentic Discontinuity
The birth of "execution data" as a new strategic asset in the Agentic Era.

Last week, researchers from UCL's Centre for Artificial Intelligence and Huawei Noah's Ark Lab published Memento, a framework demonstrating that agents can achieve state-of-the-art performance through sophisticated external memory without fine-tuning the underlying language model. This methodology achieved remarkable benchmarks — all while using 50-80% fewer computational resources than traditional fine-tuning approaches. Here's why this matters for the future of Agentic AI.
As we continue to watch progress on Agentic AI, a key question has haunted the recent development of AI agents: How should they learn from experience?
Traditional approaches fall short. Pre-training on massive datasets produces static knowledge frozen at deployment. Human feedback loops bottleneck on scarce annotation resources. What AI agents actually require is something more organic: the ability to remember, reflect, and improve from their own interactions with the world.
This question strikes at the heart of what distinguishes true agents from sophisticated automation.
As I detailed in "Agentic Era Part 4", we're transitioning from applications that enhance human productivity to "Synthetic Colleagues." These systems can replace entire organizational functions and assume responsibility for complete workflows.
Yet, without memory, these agents are like Leonard Shelby, the main character in Christopher Nolan's film “Memento,” who suffers from anterograde amnesia —a condition that prevents him from forming new long-term memories after a head injury.
Like poor Leonard, AI agents are capable of sophisticated reasoning within each interaction but are unable to build upon past experiences, condemned to repeat the same discoveries and mistakes in an endless loop.
The urgency of solving this memory challenge has intensified. As I wrote in last December's newsletter, Microsoft CEO Satya Nadella had warned at the time that "SaaS business applications could collapse in the Agent era." He was sending a strong signal that AI agents were not iterating on existing software paradigms, but were instead replacing them entirely. Now, months later, the debate on software's future rages on. Yet the answers (and even the questions) fail to address the critical issues. One of them is how agents learn and improve after deployment.
The companies that survive this transition will be those that build what I call a "Durable Growth Moat," combining strong fundamentals with adaptive capacity that turns disruption into advantage.
By demonstrating that agents can achieve state-of-the-art performance by augmenting them with sophisticated external memory and not just by enhancing the underlying language model, the Memento research introduces what I qualify as a true Discontinuity, not mere disruption or innovation, but a fundamental break in established patterns of value creation.
As I argued in "What is Discontinuity?", generative AI goes beyond disruption because it requires us to conceive of what has not yet happened and imagine consequences that will be unleashed.
The shift from parameter-based to memory-based learning exemplifies this perfectly: it's not just a technical improvement but a restructuring of how AI systems accumulate and compound value over time, invalidating some of the core assumptions that have guided AI development since the transformer revolution.
The Anatomy of Agentic Memory
To understand why Memento signals a discontinuity, we must first examine what memory means in the context of autonomous agents.
As I explored in "The Great AI Discontinuity", we're witnessing the emergence of truly agentic AI systems that can "understand contexts, set their own sub-goals, and navigate complex decision trees without constant human guidance." But these systems face a critical limitation: without memory, they're sophisticated but static.
Memento's memory architecture isn't the static retrieval of RAG systems, which simply fetch relevant documents from a corpus. Nor is it the parametric memory embedded in model weights through training.
It's something qualitatively different: episodic, experiential, and evolutionary.
Episodic Structure: Beyond Information Retrieval
Traditional AI systems operate on declarative knowledge, such as facts, relationships, and patterns extracted from training data. When you ask GPT-4 about the capital of France, it retrieves this information from patterns encoded in its weights during training. This is analogous to semantic memory in humans, the general knowledge divorced from specific experiences.
Memento implements what cognitive scientists call episodic memory, which is the recording of specific experiences situated in time and context. Each interaction generates a complete trace: the initial problem state, the strategy employed, and whether it succeeded or failed. These aren't abstract patterns but concrete experiences with contextual details that make them retrievable and applicable.
Consider a coding agent attempting to fix a complex bug.
With parametric memory alone, it might know that null pointer exceptions require checking for uninitialized variables. With episodic memory, it remembers the specific instance where a similar bug in a React component was caused by an async state update, the exact sequence of debugging steps that revealed the issue, and the particular fix that resolved it. This specificity enables nuanced problem-solving that goes beyond pattern matching to genuine learning from experience.
Memento formalizes this through a Memory-augmented Markov Decision Process (M-MDP) - essentially a framework where agents maintain external memory banks containing tuples of past experiences (state, action, reward) that can be retrieved and adapted for future decisions.

Learning no longer requires backpropagation through millions of parameters. New experiences are immediately available for future decisions. The agent can selectively forget outdated experiences without retraining.
Case-Based Reasoning: The Bridge Between Past and Present
This episodic foundation enables the second innovation: sophisticated case-based reasoning that mirrors human expertise.
The technical implementation employs a two-stage architecture:
The Planner's matching process operates across multiple dimensions simultaneously. It evaluates structural similarity - recognizing when a React state management issue mirrors the architecture of a past async Python bug, even if surface details differ completely. It checks contextual alignment, ensuring the constraints and resources match previous scenarios. It weights outcomes, prioritizing approaches that succeeded over those that failed. And it factors in temporal relevance, giving recent experiences more weight than aging solutions that may no longer apply.
The Executor then transforms these insights into action, orchestrating everything from web searches to code execution. It writes rich annotations back to memory - exactly which API calls succeeded, what sequencing proved optimal, where bottlenecks emerged, and why certain approaches failed.
This transforms every execution into a teaching moment for future agents.
Contrasting Paradigms: Memory vs. Parameters
The battle lines in agentic AI are now clearly drawn, and Memento’s success highlights what is at stake.
Traditional fine-tuning approaches - whether through LoRA (Low-Rank Adaptation, a parameter-efficient tuning method) adapters or RLHF (Reinforcement Learning from Human Feedback) - require you to alter the model's parameters. This is expensive: training even a modest adapter for a 70B parameter model routinely consumes hundreds of thousands in GPU costs. It’s also slow: each training cycle takes days or weeks, and the resulting model is frozen at that moment in time.
Memento demonstrates a very different path. By maintaining an external case bank that operates at runtime, it achieves 50-80% lower computational costs simply by eliminating gradient computation. There is no training pipeline, no model versioning, no careful rollout procedures: New experiences become available instantly. When an agent encounters a novel situation, that experience improves the next day’s performance without any retraining.
This is not just technical; it is also strategic. In the fine-tuning paradigm, the moat belongs to those with the most compute: OpenAI, Google, Anthropic. But with memory-augmented learning, the moat shifts to those with the best domain-specific experiences. A startup with deep expertise in legal contracts can build agents that outperform GPT-4 on contract analysis, not by outspending OpenAI on compute, but by accumulating and curating superior execution data. With Memento, the playing field could be radically leveled.
The Discontinuity: From Training-Time to Runtime Learning
This architectural shift from parameter-based to memory-based learning represents a true Discontinuity, invalidating assumptions about how AI creates and captures value. Three transformations illustrate the magnitude of this shift:
The Compute Arbitrage
Traditional fine-tuning approaches require substantial computational resources. As discussed above, training a LoRA adapter for a 70B parameter model is expensive.
But the real arbitrage isn't just in training costs. It's in the elimination of the training-deployment cycle entirely. When an agent encounters a novel situation on Tuesday, that experience is immediately available to improve Wednesday's performance. There's no model versioning, no A/B testing of different fine-tuning runs, no careful rollout procedures.
Learning happens continuously, at runtime, as a natural consequence of operation.
The Composability Revolution
Perhaps more significant than cost reduction is the newfound composability of learned behaviors.
In the parameter-update paradigm, combining the capabilities of two fine-tuned models is a complex research problem. You can't simply average their weights or concatenate their layers. Each model is a monolithic artifact, its knowledge entangled in millions of inscrutable parameters.
Memory-based learning enables modular composition. Different agents can share memory banks, selectively importing experiences relevant to their domains. A legal research agent can borrow negotiation strategies from a contracts agent without inheriting its specific case knowledge. Teams of agents can maintain hierarchical memories, local expertise rolling up to departmental knowledge pools.
This architecture enables what we might call swarm intelligence for AI agents.
Unlike traditional swarm systems that rely on simple rules producing emergent behaviors, memory-augmented agent swarms share learned strategies and accumulated wisdom. Each agent contributes to and benefits from a collective intelligence that transcends individual capabilities. When one agent discovers a novel solution to a complex problem, that breakthrough immediately becomes available to the entire swarm—not as a parameter update requiring retraining, but as an accessible memory that can be retrieved, adapted, and refined by others.
This composability extends temporally as well. Organizations can checkpoint memory states, experiment with different learning strategies, and roll back if performance degrades. They can A/B test not different models but different memory curation strategies. They can even implement "memory inheritance" where new agents bootstrap from the accumulated experiences of their predecessors.
The Democratization of Continuous Learning
Most profoundly, Memento democratizes access to continuously learning systems. When adaptation required fine-tuning, only organizations with massive computational resources and ML expertise could build truly adaptive agents. OpenAI, Google, and Anthropic held structural advantages through their infrastructure and talent density.
Memory-augmented learning levels this playing field. A startup with access to GPT-4's API can build agents that outperform OpenAI's own implementations on specialized tasks. The moat is no longer compute or model access. Instead, it's the accumulation and curation of domain-specific experiences. This shifts competitive advantage from those with the most resources to those with the most insight into specific problem domains.
These technical advances point to something even more profound: the emergence of an entirely new asset class.
Execution Data: The New Strategic Asset
Memento's architecture births an entirely new category of valuable data: execution data. This isn't your traditional enterprise data sitting in Snowflake warehouses, nor is it the training data that fed large language models. It is dynamic, experiential, and compounding in value over time.
The Nature of Execution Data
The distinction between traditional data and execution data is fundamental:
Traditional enterprise data answers "what happened?"—sales records, customer demographics.
Training data encodes "what patterns exist?"—language structures that get compressed into GPT-4's weights.
Execution data, as demonstrated by Memento’s Case Bank, captures "what worked and why?", the specific sequence of tool calls that solved a GAIA Level 3 task, the exact debugging steps that revealed an edge case, and the particular planning decomposition that succeeded where others failed.
When Memento stores a case tuple (state, action, reward), it's not just logging information—it's creating a reusable asset. Each successful debugging session, each failed negotiation strategy, and each creative solution becomes individually retrievable and adaptable. This is very different from training data that gets "baked into" model weights.
Consider what happens in Memento’s Case Bank: A successful approach to fixing a React async bug doesn't just disappear into averaged parameters. It remains a discrete, accessible experience that can be retrieved, examined, and adapted when a similar pattern emerges. This preservation of specificity is what transforms mere data into what I term "execution data."
The Compounding Effect
The compounding effect is visible in Memento’s performance curves. The paper shows accuracy improving from 78.65% to 84.47% over five iterations on DeepResearcher through memory accumulation (and not retraining). More tellingly, their out-of-distribution results show 4.7% to 9.6% absolute gains when case-based memory is enabled. This isn't just more data; it's each experience building on previous foundations.
Traditional data exhibits diminishing returns. The millionth customer record adds less insight than the thousandth. The billionth web page provides fewer novel patterns than the millionth. This is why data moats, while valuable, eventually plateau in their defensive power.
Execution data, by contrast, can exhibit increasing returns. Early experiences establish basic competencies, such as how to parse error messages, structure API calls, and validate outputs. Later experiences build on these foundations, exploring edge cases and developing sophisticated strategies. The thousand-and-first debugging session might discover a novel approach that dramatically improves performance on an entire class of problems.
This compounding effect creates what I've termed within the Durable Growth Moat framework an "adaptive capacity". This describes the ability to strengthen during periods of disruption rather than merely survive them. Each challenge faced, each edge case encountered, each novel solution discovered adds to an ever-growing repository of experiential wisdom that competitors cannot easily replicate.
The result is winner-take-all dynamics within specific domains. The first legal AI to handle a thousand contract negotiations won’t just have more data. It will have a richer understanding of the problem space, including rare edge cases and creative solutions that competitors can't easily acquire. Each additional interaction widens this gap, creating an "experience moat" that compounds over time.
The Emergence of Memory Capital
In financial terms, execution data represents a new form of capital: memory capital. Like intellectual capital or brand capital, it's an intangible asset that generates future returns. But unlike these traditional intangibles, memory capital—as demonstrated by Memento’s Case Bank—can be precisely measured (number of cases), directly deployed (retrieval and adaptation), and explicitly valued (performance improvement per case). Consider Memento’s GAIA validation set performance: 87.88% accuracy built from accumulated cases. This memory bank isn't just a technical artifact—it's a productive asset that generates value with each deployment.
The market is already pricing in this shift, though perhaps without fully understanding it. MongoDB's ~36% YTD share price gain reflects demand for vector storage capabilities, exactly what Memento uses for its Case Memory module. Weaviate and similar vector databases are seeing explosive growth.
But Memento suggests the real winners won't be pure infrastructure plays. The paper's use of simple cosine similarity for retrieval shows that competitive advantage doesn't come from sophisticated database technology—it comes from the quality and curation of the cases themselves. Infrastructure commoditizes, but domain-specific execution data compounds.
A critical risk remains platform lock-in: memories optimized for OpenAI's retrieval patterns and prompt structures may require significant re-engineering to work with Anthropic's Claude or Google's Gemini, creating switching costs that could entrench early platform choices.
Conclusion
Memento proves that memory, not models, can drive performance. But more than that, it demonstrates that this memory - what I've termed execution data - represents a radically new form of value creation in AI. The companies that recognize this shift won't just build better agents; they'll accumulate compounding assets that create lasting competitive moats in the agentic era.
Memory is just the beginning. In Part 2, we'll explore how these episodic traces become the substrate for something even more profound: nested world models that enable agents to simulate entire realities. The convergence of memory and simulation is the most promising path to artificial general intelligence.
Great post as usual. It's a good point on agents learning from past experience, but there is another layer. What is the current context they operate in? The reality of business is that the context of the current reality is constantly changing, so how do you combine context with experience?