Once your agent has months of memory, four failure modes show up that a general retrieval stack never solves: concept lookups slow to a linear crawl, retrieval ignores what you’re actually working on right now, autonomous writes silently contradict old facts, and compaction permanently shreds detail that was used once and never queried again. MNEMOSYNE closes all four — as a single plugin layered over your existing memory core, with zero changes to the underlying source. Part of our open Building Jarvis series.
Abstract
Personal AI assistants that accumulate months of memory face four gaps that general-purpose retrieval systems do not close: constant-time concept lookup degrades to linear search at scale, retrieval scoring ignores the agent’s current task, memory writes can silently introduce contradictions, and context compaction permanently destroys information that was used once but never queried again.
We present MNEMOSYNE, a plugin-layered enhancement suite that addresses all four gaps without modifying the underlying memory-core codebase. MNEMOSYNE registers four hooks — retrieval_pre, before_message_write, before_compaction, and after_compaction — on top of OpenClaw’s markdown-first memory system. A concept index and compaction-aware capture are deployed in production; task-conditioned retrieval scoring and the contradiction gate are scaffolded for the next release.
We also describe how the contradiction gate becomes a bi-temporal validity layer (§9.5): rather than blocking or warning, a contradicting write supersedes the prior fact by closing its validity interval, so the corpus retains full history and every retrieval can ask “what is true now” versus “what was true then.” The validity substrate is not paper-only — memory-core already ships the validity columns, the temporal search predicate, and the supersede writer, so MNEMOSYNE has only to route the gate’s decision into it.
We position the compaction-capture feature explicitly against reversible Compress-Cache-Retrieve as implemented in chopratejas/headroom, which solves the eviction-loss problem from the opposite end (lossless-by-caching rather than selective-by-filtering), and we are candid that MNEMOSYNE’s central performance claims remain architectural rather than benchmarked. The plugin packages all four features as a single installable unit for marketplace distribution.
The claims, in numbers
Claims from the paper, stated here without the proofs — the methodology, complexity analysis, and architecture are all in the PDF. The O(1) property is theoretical (data-structure guarantee); wall-clock benchmarks on the production corpus are pending (§10.4).
How it works, in one minute
- A concept index for instant lookups. A pre-computed map from recurring concepts — project names, people, tools, error classes — to their memory chunks. A query that names a known concept is answered with a token lookup in a Map, skipping the full hybrid scan entirely. It stays constant-time as the corpus grows from thousands to hundreds of thousands of chunks.
- Task-conditioned retrieval. A score modifier biases retrieval toward whatever the agent is working on right now — debugging a cron job, drafting an email, reviewing code — instead of ranking purely on embedding similarity.
- A contradiction gate on writes. Autonomous nightly consolidation writes its own memories, and those can contradict earlier facts with no human in the loop. The gate intercepts a write before it’s promoted to durable storage and decides: allow, warn, block — or supersede.
- Supersede instead of block (bi-temporal validity). Rather than throwing a contradicting fact away, MNEMOSYNE closes the old fact’s validity interval and opens a new one. The corpus keeps full history, so every retrieval can ask “what’s true now” versus “what was true then.” The validity columns, temporal search predicate, and supersede writer already ship in memory-core — MNEMOSYNE routes the gate’s decision into them.
- Compaction-aware capture. A hook fires the moment context is about to be evicted and snapshots the content a pointer alone can’t recover — so detail used once isn’t lost forever when the window compacts.
The reason it’s a plugin and not a fork: failure isolation (a crash in the suite doesn’t take down the memory core), upgrade independence (no merge conflicts when the core ships updates), and clean distribution — install it from the marketplace, toggle each of the four features on or off independently.
What’s live, what’s scaffolded
| Feature | Status | Notes |
|---|---|---|
| Concept index | ● Live | anchors.json persisted, lazy loading, nightly rebuild |
| Compaction-aware capture | ● Live | Capturing to captured/, FTS5 indexing confirmed |
| Task-conditioned scoring | ● Scaffolded | Hook registered; no task descriptor extraction yet |
| Contradiction gate | ● Scaffolded | Injects passive warnings; entity-level conflict detection not yet wired |
| Bi-temporal supersede (§9.5) | ● Core-ready, plugin-pending | Validity columns + supersede writer already in memory-core; MNEMOSYNE routing pending |
| Reproducible eval harness (§10.4) | ○ Not built | No latency or accuracy benchmark exists; first-class limitation, not a footnote |
Where this sits in the ecosystem
chopratejas/headroom (Apache-2.0) is the most directly comparable external system to MNEMOSYNE’s compaction capture. It implements reversible Compress-Cache-Retrieve: rather than dropping evicted content, headroom compresses it, caches the originals, and lets the LLM retrieve the full original on demand. It ships per-content-type compressors — statistical crushing for JSON arrays (SmartCrusher), AST-aware code compression via tree-sitter, and dedicated log/diff/text handlers — reports 60–95% token savings at near-zero accuracy delta on GSM8K, TruthfulQA, SQuAD, and BFCL, and ships a reproducible evaluation suite (python -m headroom.evals) alongside a trained compression model (Kompress-v2-base).
headroom and MNEMOSYNE solve overlapping problems from opposite ends. headroom is content-agnostic, reversible, and lossless-by-caching: every byte survives eviction in compressed form and is recoverable. MNEMOSYNE is selective and semantic: it deliberately filters out greetings and redundant content and keeps only the “by the way” details, feeding them into the concept index and — via the bi-temporal layer — a validity-aware corpus.
The two are composable rather than competing: headroom could compress MNEMOSYNE’s captured/ corpus; conversely, MNEMOSYNE’s pointer-plus-capture pair is lossy exactly where headroom’s cache is reversible. The paper is candid that headroom is the stronger system on the narrow axis of eviction-loss recovery.
What MNEMOSYNE provides that headroom does not: O(1) concept-index retrieval, task-conditioned scoring, write-time contradiction gating, and bi-temporal supersession — the full four-hook composition for a single personal agent’s memory system. The slogans: headroom maximizes recall-of-everything-per-token; MNEMOSYNE maximizes signal-per-stored-chunk.
⭐ Follow the work on GitHub
MNEMOSYNE is part of the Tinkerclaw open-source stack. Star the repo to follow along as the scaffolded features land.
Read the paper

20 pages · the four hooks, the concept-index complexity analysis, the contradiction gate, and the bi-temporal validity layer · PDF is v1.4; paper text is at v1.5
Was this useful?
We’re building these in the open and we want your read on them. Did this land — 👍 or 👎? What would you want the next paper to dig into? Tell us in the comments below.
More from Building Jarvis
- SALIENCE: The Death of Fixed Thresholds, the Pyramid of Significance, and Cheap Traversal as the Basis of Next-Generation Vibe Programming
- Instant Recall: A Pre-Computed Concept Index for O(1) Memory Retrieval in Persistent AI Agents
- Fractal Reasoning: Multi-Resolution Memory and Self-Similar Metacognition for LLM Agents
- Identity Persistence: Keeping an LLM Agent’s Personality Stable Across Sessions, Model Swaps, and Restarts

Leave a Reply