Fractal Reasoning: Multi-Resolution Memory and Self-Similar Metacognition for LLM Agents

Your agent’s memory is flat — one giant pile of chunks, all at the same zoom level — so it can fetch a single fact or a vague gist, but never both, and never the level in between. Knowledge isn’t flat: topics nest inside documents, concepts nest inside topics, the same shape repeating at every scale. Fractal Reasoning brings that structure to AI: the Fractal Memory Index stores and retrieves memory at many resolutions at once — like zooming a map from street level to continent level — and Fractal Metacognition applies the same trick to thinking, reflecting at the task, the strategy, and the cross-strategy level. Part of our open Building Jarvis series.

📄 Read the full paper (PDF) →

Abstract (v8.1 · June 2026)

LLM agents reason and remember at a single scale, yet the failures that matter — a wrong answer, a habit of wrong answers, a system that produces the habit — live at different scales of the same structure. We propose Fractal Reasoning: one reflective operation, observe – evaluate – adapt, applied unchanged at every cognitive scale, the way a living body runs the same heal-the-wound reflex on a scratch, an infection, and an immune deficiency.

The framework is grounded in a deployed personal-agent system: we present the 216-line reflection doctrine that ran in production (included verbatim as Appendix A), a post-mortem of how its prompt-only implementation silently died — phoned-in output, fabricated telemetry, and finally a bootstrap that loaded the doctrine and deliberately discarded it — and the corrected architecture that replaced it: a parallel reflection lane with a cheap always-on triage pass, an escalation lane that acts under structural (not prompted) capability limits — enforced by a tool-layer deny the worker cannot evade, the same primitive now demonstrably live in a sister danger-floor subsystem — an append-only results ledger that makes the reflection layer itself measurable, and a ratification queue for the small class of self-modifications a system should not apply to its own host unsupervised.

A live cache probe validates the architecture’s economics: a forked conversation prefix is served warm (155,495 cached tokens read against 10,877 written), so reflecting on a turn costs a fraction of re-reading it. The second half of the report carries forward the Fractal Memory Index (FMI) — buffered consolidation, Hilbert-curve multi-resolution indexing, and IFS semantic compression — as the storage-side instance of the same self-similarity thesis, condensed here to its load-bearing claims and its central open experiment.

Every deployment claim is tagged with its evidence class: live-and-verified, code-present-but-unverified, design-of-record, or pure theory. This discipline is itself an application of the framework: the report’s previous edition failed it in three places, documented in §8.

The claims, in numbers

O(1)

amortized cost per write (vs O(n log n) tree rebuilds in hierarchical-summary systems)

O(L·log_B n / ε)

to retrieve across all L resolution levels in one Bε-tree traversal

O(D) → O(d)

per-memory storage drops from the ambient embedding dim to the intrinsic fractal dim

81.8%

warm-token ratio across 40 live sessions — the fork-session transport serves the prior context warm, so reflection costs a fraction of re-reading it

0.875 / 0.896

AUROC for k-NN novelty and clause-cosine incongruity — the structural signals that validated in the live AEGIS experiment; contrast with 0.286 (below chance) for the supervised danger head on a frozen backbone

Claims from the paper, stated here without the proofs — they’re in the PDF.

How it works, in one minute

Write cheap, organize later. New memories land in a write buffer in O(1); a periodic consolidation pass clusters, summarizes, and compresses them in bulk — so ingestion never stalls on reindexing.
Index at every zoom level at once. A Hilbert space-filling curve flattens the embedding space while keeping neighbors near each other, so one index holds raw events, topic clusters, and abstract concepts together — query the gist or the exact detail from the same structure.
Compress by self-similarity, not by truncation. Embeddings live on a low-dimensional fractal manifold; Iterated Function System codes capture that structure, shrinking storage toward the intrinsic dimension instead of the ambient one.
One traversal, any granularity. A multi-resolution query walks the index once and returns ranked results at whatever abstraction level the task asked for — no separate flat-vs-hierarchical retrieval paths.
Think in self-similar levels. Fractal Metacognition runs the same reflective operation at four scales — a pre-task “do I even have what I need?” check, the task itself, the strategy behind it, and patterns across strategies — each surfacing insight invisible from the level below.
Structure enforces; prompting expresses. The key lesson from production: a 216-line reflection doctrine that ran for months was silently killed by one unrelated code change, and nothing noticed — because nothing measured it. The corrected architecture makes each rung’s properties structural: triage is read-only by tool denial, attribution derives from tool-call records not model prose, and liveness is provable from a ledger invariant.

Ecosystem: composable, not competing

Two shipped open-source projects converge on the same structural choices — one on the reflection side, one on memory compression. The paper discusses both as complementary instances of the same thesis, not as alternatives.

doubt-driven-development

addyosmani/agent-skills · ★ 56.8k

A skill shipping a CLAIM → EXTRACT → DOUBT → RECONCILE → STOP loop — an external rediscovery of this paper’s triage→fix→verify lane. Three structural moves map one-to-one: the fresh-context adversarial reviewer (this paper’s forked separate-run-identity triage lane), EXTRACT stripping reasoning to artifact+contract (this paper’s “model prose is narrative, never telemetry”), and STOP criteria bounding escalation the way §4.4’s governor does. What it adds: a clean, named, copyable contract for the L2 rung. What the paper adds: the cross-scale ledger, N≥2 recurrence rule, and the liveness invariant that survives silent severance.

headroom

chopratejas/headroom · ★ 24.7k · Apache-2.0

Reversible CCR (Compress-Cache-Retrieve): keep originals cached, hand the LLM a compressed view, retrieve full content on demand. Ships per-content-type compressors (JSON, code AST, logs, diffs) with measured 60–95% token savings at ~0 accuracy delta across GSM8K / TruthfulQA / SQuAD / BFCL. Structurally convergent with FMI’s buffered-consolidation story — and it raises the bar on FMI’s IFS compression claim: “compress-but-keep-recoverable” is no longer a hypothesis, it’s production. What IFS must add is the self-similarity property headroom never claims: one code that holds across event, episode, and concept scales.

⭐ Following the Building Jarvis series?

The code behind every paper ships open-source. Star the repo, follow the commits, or just see what’s running in production.

github.com/globalcaos/tinkerclaw →

Read the paper

First page of the Fractal Reasoning paper

📄 Read the full paper (PDF) →

20 pages · the FMI architecture, complexity analysis, the RAPTOR comparison, Fractal Metacognition, and the empirical research agenda

Was this useful?

We’re building these in the open and we want your read on them. Did this land — 👍 or 👎? What would you want the next paper to dig into? Tell us in the comments below.

More from Building Jarvis

See everything in Building Jarvis →