Total Recall: Pointer-Based Compaction and Task-Conditioned Retrieval for Persistent LLM Agents

Posted by:

|

On:

|

Your AI agent nails the task — then a context window later, it has forgotten the one hash, file path, or decision that actually mattered. Most agents “compact” old context by summarizing it, which quietly throws away the precise details. Total Recall is a memory architecture that never loses them: it treats the context window as a cache over a durable store, evicts with recoverable pointers instead of lossy summaries, and keeps total memory bounded. This is the first paper in our open research series on giving agents real, persistent memory.

📄 Read the full paper (PDF) →

Abstract

Persistent LLM agents must preserve precise strings, causal chains, and decision rationales across sessions that routinely exceed context limits. The industry-standard approach — narrative compaction — replaces this high-resolution state with lossy prose summaries, creating an irrecoverable “only copy” failure.

We present Total Recall, a lossless, event-sourced memory architecture that treats the context window as a managed cache over a durable store. Instead of summarizing, it evicts history via pointer-based compaction — compact time-range markers with topic hints and retrieval directives — while keeping all evicted content recoverable through a recall(query) tool. Retrieval is task-conditioned: what gets re-injected depends on the active task, not just embedding similarity. A write-reconciliation step keeps total memory bounded along both axes — the in-context cache and the long-term store.

The TRACE production implementation validates these claims across controlled benchmarks and months of real deployment.

The claims, in numbers

100%
needle-in-haystack recall under forced compaction (vs 0% for truncation)

94%
exact-match recall at 2 hops (vs 4% narrative, 36% MemGPT-style paging)

14,000+
real emails indexed in production with zero lossless-invariant violations

These are claims from the paper, stated here without the proofs. The methodology, benchmarks, and derivations are all in the PDF.

How it works, in one minute

  • The context window is a cache, not memory. Treat it like one — back it with a durable store that never loses the original.
  • Pointer-based compaction. Instead of summarizing old context into lossy prose, evict it and leave a compact time-range marker with topic hints — the full detail stays recoverable on demand via a recall() tool.
  • Task-conditioned retrieval. What gets pulled back into context depends on the active task and expected next needs, not just raw similarity.
  • Bounded by design. A write-reconciliation step (add / update / delete / none) keeps the working memory from growing forever while the audit log stays append-only and complete.

Total Recall memory architecture: storage and compaction feeds an indexing layer, with a nightly consolidation loop.

Read the paper


First page of the Total Recall paper

📄 Read the full paper (PDF) →

18 pages · the full architecture, benchmarks, and the TRACE implementation

Was this useful?

We’re building these in the open and we want your read on them. Did this land — 👍 or 👎? What would you want the next paper to dig into? Tell us in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *