Identity Persistence: Keeping an LLM Agent's Personality Stable Across Sessions, Model Swaps, and Restarts

Your agent keeps every fact and still stops sounding like itself — the voice, the wit, the stance all quietly drain away as the context window fills. That failure has a name: persona erosion. Identity Persistence keeps the persona pinned every turn, compresses task content without flattening the voice, and catches drift in about two turns — so the agent stays itself across long sessions, model swaps, and full restarts. Part of our open Building Jarvis series.

📄 Read the full paper (PDF) →

Abstract

Persistent LLM agents lose their personality as context windows fill — a failure mode we call persona erosion. Identity Persistence solves this through three interlocking mechanisms: priority-aware injection ensures personality is present every turn; identity-preserving compaction compresses task content while retaining persona markers; and adaptive two-signal drift detection catches personality shifts in ~2 turns, before they compound into visible degradation.

Because the persona lives in editable state outside any model’s weights, the same machinery keeps identity stable when the underlying model is swapped mid-conversation — a property training-time methods cannot offer — and lets an external reflection layer rewrite the persona overnight, so identity not only persists but improves. A discrete-time Lyapunov analysis provides closed-form variance bounds. Component benchmarks confirm 50-turn SyncScore stability (mean 0.977), drift recovery from 0.027 to 0.980, and 442× on/off-persona separation. Human evaluation (30 logs, 3 judges, Krippendorff’s α = 0.81) yields consistency of 4.2 ± 0.4 versus 2.6 ± 0.7 baseline. In over 30 days of continuous production deployment spanning model switches, context resets, and hundreds of sessions, the agent required no manual persona correction.

The claims, in numbers

442×

separation between on-persona and off-persona responses (the detector isn’t guessing)

0.027 → 0.980

drift recovery after a deliberate persona break (it pulls itself back to target)

4.2 vs 2.6

human-judged consistency vs a no-persona-layer baseline (3 judges, α = 0.81)

Claims from the paper, stated here without the proofs — the methodology, benchmarks, and derivations are all in the PDF.

How it works, in one minute

The whole design rests on one move: separate the persona from the task, then protect each differently.

Pin the persona, every turn. The persona block lives in a non-evictable top tier, injected first so it can’t get diluted as the window fills — and capped at ≤5% of context so it never crowds out the actual work.
Compact the task, not the soul. When old context is compressed, a dual loss keeps the factual summary and a designated persona feature space — so the agent’s voice survives a compaction pass instead of being smoothed into generic prose. The same frozen encoder also reveals a hard scoping boundary: it carries style and voice cleanly (AUROC 0.875 for novelty detection, 0.896 for incongruity), but a supervised harmfulness head on the same encoder scored AUROC 0.286 — below chance. That measurement is why safety lives outside the persona feature space entirely, as a symbolic floor beneath the persona loop, not as another axis to balance.
Catch drift with two signals. Sparse user corrections (precise but rare) are fused with dense automated probes (constant but noisier); when users go quiet, the system leans harder on probes — so shifts get caught in about two turns either way.
Identity outlives the model. Because the persona is editable state outside the weights, swapping the underlying model mid-conversation doesn’t reset who the agent is — and a nightly reflection layer can rewrite the persona so it improves over time.
Provably stable. A discrete-time Lyapunov analysis gives closed-form bounds on how far the persona can wander, so correction converges instead of oscillating.

Where this fits in the ecosystem

Identity Persistence occupies the identity layer of a larger cognitive stack. One nearby open-source project sits on a composable, not competing, axis:

headroom — chopratejas/headroom · ~24.7k ★

Performs reversible Compress-Cache-Retrieve (CCR): factual originals are cached and retrieved on demand, so compression is lossless-by-recovery rather than lossy. Headroom is persona-agnostic — its loss preserves task answerability and has no notion of a protected persona feature space. IPC’s contribution is exactly the term headroom lacks: the E_φ persona-distance gate that re-runs summarization when style has eroded. The two compose naturally: headroom owns the reversible factual track; IPC contributes only the persona-preservation pass on top.

🔨 Built in the open

The identity layer is part of tinkerclaw, the open-source agent platform the Building Jarvis series builds toward. Star or follow for updates as the J-series implementations ship.

⭐ github.com/globalcaos/tinkerclaw

Read the paper

First page of the Identity Persistence paper

📄 Read the full paper (PDF) →

18 pages · the full architecture, the compaction loss, the drift-detection math, the stability proof, and the production results

Was this useful?

We’re building these in the open and we want your read on them. Did this land — 👍 or 👎? What would you want the next paper to dig into? Tell us in the comments below.

More from Building Jarvis

See everything in Building Jarvis →

Identity Persistence: Keeping an LLM Agent’s Personality Stable Across Sessions, Model Swaps, and Restarts