Instant Recall: A Pre-Computed Concept Index for O(1) Memory Retrieval in Persistent AI Agents

Your agent has the answer in its memory — months of notes, decisions, and docs — and still replies “what papers are you referring to?” The data isn’t gone; the retrieval just never fired in time. Instant Recall fixes this by building a tiny concept-to-memory index offline, once, so every query becomes a constant-time lookup that resolves before the model even reads your prompt — no inference-time search at all. Part of our open Building Jarvis series.

📄 Read the full paper (PDF) →

Abstract

Persistent AI agents accumulate months of interaction history yet routinely fail to retrieve knowledge they demonstrably own. We formalize this as Contextual Ownership Retrieval Failure (CORF), identify four failure modes from eighteen months of deployment logs, and introduce Instant Recall — a pre-computed concept-to-memory index that resolves retrieval before the model processes the prompt.

By mapping anchor vocabulary terms to pre-ranked memory clusters offline, Instant Recall achieves 0.85 CORF-Recall@20 at 54ms p95 latency — a 5.9× speedup over MemGPT-style sequential retrieval (0.79 at 320ms) — with a false positive rate of 0.18 and zero inference-time search computation.

A probabilistic source-coverage analysis establishes that parallel source aggregation, not query quality, is the binding constraint on retrieval completeness. Concept anchoring closes the 9-point recall gap that remains after source coverage is saturated. Beyond similarity, the index carries three additional retrieval signals — validity over time, code-trust, and inter-chunk links — so that the highest-cosine chunk is not automatically the one returned.

We position Instant Recall against contemporary open-source agent-context systems — most directly headroom’s reversible Compress-Cache-Retrieve compression layer — and report an independent embedding measurement that scopes which of our forward-looking signals a frozen embedding model can and cannot carry.

The claims, in numbers

54ms

p95 retrieval latency at 0.85 recall@20 (zero inference-time search)

5.9×

faster than MemGPT-style sequential retrieval (54ms vs 320ms p95)

9 pts

recall gap closed by concept anchoring after source coverage is saturated

Claims from the paper, stated here without the proofs — they’re in the PDF.

How it works, in one minute

It’s an index, not a store. Like the index at the back of a book, Instant Recall doesn’t hold your memories — it knows where each one lives. Look up “peer review” and it hands back the right clusters instantly, instead of re-reading every page.
The work happens offline. A nightly consolidation pass maps anchor vocabulary terms to their nearest memory clusters in embedding space. At query time there’s nothing left to compute — just a constant-time lookup that finishes in ~50ms before the LLM gets its context.
Coverage beats cleverness. The analysis shows the binding constraint on completeness is searching all your sources in parallel, not writing a smarter query. Once coverage is saturated, concept anchoring closes the remaining recall gap.
Ranking uses more than cosine. Each entry carries extra signals — how valid a memory still is over time, how much to trust code, and durable links between chunks — so the highest-similarity chunk isn’t blindly the one returned.
It stays current. Real-time incremental updates, dedup against recurring logs, and post-retrieval re-ranking keep the index lean and the false-positive rate low without a rebuild.

Where this fits in the open-source agent-context stack

Two open-source projects released in 2025–2026 sit close enough to Instant Recall to deserve explicit positioning. The paper’s §2.4 covers both.

chopratejas/headroom ~24.7k ★

A reversible Compress-Cache-Retrieve (CCR) layer: agents compress the active context window by content type (JSON, code, logs, diffs), cache the originals, and retrieve them on demand. Reports 60–95% token savings at near-zero accuracy delta.

Relationship: headroom answers “how do I fit selected content in the window losslessly”; Instant Recall answers “which of months of stored content belongs in the window at all.” They are vertically adjacent — headroom as the compaction layer below, Instant Recall as the concept index above. Composable, not competing.

addyosmani/agent-skills ~56.8k ★

A discipline and skill library for coding agents. Not a memory system — it does not retrieve over a persistent store.

Relationship: One practice it formalizes is relevant here: a context-contract file that an agent reads before acting, separating reusable capability description from per-task state. Instant Recall’s narrow (content, timestamp, source) storage contract is the memory-stack analogue of that same discipline — a convergence the paper notes rather than claims derivation from.

What a frozen embedding can — and cannot — carry

Instant Recall runs end-to-end on a frozen sentence-embedding model. To scope exactly which operations that model can be trusted to carry, we ran an independent offline measurement on a separate frozen model (MiniLM, commodity hardware). The result was a sharp split between structural signals and learned semantic judgements:

0.875

AUROC — novelty signal (k-NN; structural)

0.896

AUROC — incongruity signal (clause-cosine; structural)

0.286

AUROC — harmfulness classification (learned semantic judgement; below chance)

Frozen embeddings encode structure and distribution reliably (the anchor–chunk clustering and cosine-based dedup the architecture relies on are in this regime). They do not carry learned semantic judgements without fine-tuning (contradiction detection, calibrated retrieval-failure confidence). The paper is explicit about this boundary: signals that need judgement are scoped to require a fine-tuned or NLI model, not a cosine threshold. The structural operations have independent AUROC numbers to stand on.

The Instant Recall implementation is part of Building Jarvis, an open series on persistent agent memory. Follow the work and contribute at github.com/globalcaos/tinkerclaw.

Read the paper

📄 Read the full paper (PDF) →

17 pages · the CORF taxonomy, the index architecture, importance scoring, beyond-cosine signals, and the full evaluation protocol

Was this useful?

We’re building these in the open and we want your read on them. Did this land — 👍 or 👎? What would you want the next paper to dig into? Tell us in the comments below.

More from Building Jarvis

See everything in Building Jarvis →