Microsoft builds Memora, a memory system for AI agents

AI

Microsoft Research published a paper at ICML 2026 introducing Memora, a new memory framework built to give AI agents persistent, scalable memory across long-horizon tasks.  The system addresses one of the inability of AI agents to retain meaningful context between sessions.

Developed by Microsoft’s research team and now publicly available on GitHub, Memora outperforms every major existing memory approach on standard benchmarks while consuming up to 98 percent fewer token than loading full conversation history into a model’s context window each time.

The problem every enterprise AI deployment hits

Today’s large language models are, in a technical sense, stateless. They do not retain information between sessions. When a conversation grows long — spanning weeks of project updates, decisions made, and constraints established — the model either re-reads the entire history, which consumes enormous amounts of compute, or relies on compressed summaries that lose the precise details that made the information useful.

Existing approaches have tried to address this. Retrieval-augmented generation indexes raw text fragments. Systems like Mem0 extract atomic facts. Graph-based tools like Zep and GraphRAG impose structured entity relationships. Each represents real progress. Each also forces the same trade-off: store everything and retrieval becomes slow and noisy; compress and summarise, and the details that matter most disappear.

Memora is built to resolve that tension rather than work around it.

How Memora works

The core design decision behind Memora is a clean separation between what is stored and how it is retrieved.

Each memory has two components. The first is a primary abstraction: a short six-to-eight-word phrase capturing what the memory is fundamentally about. The second is the full memory value — the complete content, including timelines, multi-turn discussions, specific names, numbers, and decisions.

Critically, only the primary abstraction is embedded for similarity search. The full content is never retrieved through its own text directly. When new information arrives on the same topic, it merges into the existing memory entry rather than spawning a chain of fragmented, partially overlapping duplicates — which is the failure mode that plagues RAG and Mem0 at scale.

Alongside primary abstractions, Memora generates cue anchors — short contextual tags that provide alternative retrieval paths into the same memory. A single memory about a project timeline can be accessed through a query about a specific person’s role, a milestone date, or a project phase — all routing to the same underlying content through different entry points.

On top of this, Memora’s policy-guided retriever treats memory access as active reasoning rather than passive lookup. Instead of returning the top results from one similarity search, it iterates — refining queries, expanding through cue anchors, and deciding when enough relevant context has been gathered.

Why this matters for enterprise AI

An AI agent that genuinely remembers — that can recall not just a decision but the reasoning behind it, who was involved, and what was ruled out — operates at a fundamentally different level of usefulness than one that resets between sessions. The gap between a session-level assistant and a persistent collaborative agent is not a minor improvement. It is the difference between a tool that helps with tasks and one that carries institutional knowledge over time.

Microsoft Research is already extending Memora in three directions: MemLoop, which enables a memory system to learn from its own retrieval failures; Deferred Memory, which explores when to delay storing information until enough context exists to store it accurately; and Group Memory, which investigates how knowledge can be shared across teams and agents while preserving access boundaries and provenance.

Share on