Identity Persistence: Keeping an LLM Agent’s Personality Stable Across Sessions, Model Swaps, and Restarts

Posted by:

|

On:

|

Your agent keeps every fact and still stops sounding like itself — the voice, the wit, the stance all quietly drain away as the context window fills. That failure has a name: persona erosion. Identity Persistence keeps the persona pinned every turn, compresses task content without flattening the voice, and catches drift in about two turns — so the agent stays itself across long sessions, model swaps, and full restarts. Part of our open Building Jarvis series.

📄 Read the full paper (PDF) →

Abstract

Persistent LLM agents lose their personality as context windows fill — a failure mode we call persona erosion. Identity Persistence solves this through three interlocking mechanisms: priority-aware injection ensures personality is present every turn; identity-preserving compaction compresses task content while retaining persona markers; and adaptive two-signal drift detection catches personality shifts in ~2 turns, before they compound into visible degradation.

Because the persona lives in editable state outside any model’s weights, the same machinery keeps identity stable when the underlying model is swapped mid-conversation — a property training-time methods cannot offer — and lets an external reflection layer rewrite the persona overnight, so identity not only persists but improves. A discrete-time Lyapunov analysis provides closed-form variance bounds. Component benchmarks confirm 50-turn SyncScore stability (mean 0.977), drift recovery from 0.027 to 0.980, and 442× on/off-persona separation. Human evaluation (30 logs, 3 judges, Krippendorff’s α = 0.81) yields consistency of 4.2 ± 0.4 versus 2.6 ± 0.7 baseline. In over 30 days of continuous production deployment spanning model switches, context resets, and hundreds of sessions, the agent required no manual persona correction.

The claims, in numbers

442×
separation between on-persona and off-persona responses (the detector isn’t guessing)
0.027 → 0.980
drift recovery after a deliberate persona break (it pulls itself back to target)
4.2 vs 2.6
human-judged consistency vs a no-persona-layer baseline (3 judges, α = 0.81)

Claims from the paper, stated here without the proofs — the methodology, benchmarks, and derivations are all in the PDF.

How it works, in one minute

The whole design rests on one move: separate the persona from the task, then protect each differently.

  • Pin the persona, every turn. The persona block lives in a non-evictable top tier, injected first so it can’t get diluted as the window fills — and capped at ≤5% of context so it never crowds out the actual work.
  • Compact the task, not the soul. When old context is compressed, a dual loss keeps the factual summary and a designated persona feature space — so the agent’s voice survives a compaction pass instead of being smoothed into generic prose. The same frozen encoder also reveals a hard scoping boundary: it carries style and voice cleanly (AUROC 0.875 for novelty detection, 0.896 for incongruity), but a supervised harmfulness head on the same encoder scored AUROC 0.286 — below chance. That measurement is why safety lives outside the persona feature space entirely, as a symbolic floor beneath the persona loop, not as another axis to balance.
  • Catch drift with two signals. Sparse user corrections (precise but rare) are fused with dense automated probes (constant but noisier); when users go quiet, the system leans harder on probes — so shifts get caught in about two turns either way.
  • Identity outlives the model. Because the persona is editable state outside the weights, swapping the underlying model mid-conversation doesn’t reset who the agent is — and a nightly reflection layer can rewrite the persona so it improves over time.
  • Provably stable. A discrete-time Lyapunov analysis gives closed-form bounds on how far the persona can wander, so correction converges instead of oscillating.

Where this fits in the ecosystem

Identity Persistence occupies the identity layer of a larger cognitive stack. One nearby open-source project sits on a composable, not competing, axis:

headroom — chopratejas/headroom · ~24.7k ★

Performs reversible Compress-Cache-Retrieve (CCR): factual originals are cached and retrieved on demand, so compression is lossless-by-recovery rather than lossy. Headroom is persona-agnostic — its loss preserves task answerability and has no notion of a protected persona feature space. IPC’s contribution is exactly the term headroom lacks: the Eφ persona-distance gate that re-runs summarization when style has eroded. The two compose naturally: headroom owns the reversible factual track; IPC contributes only the persona-preservation pass on top.

🔨 Built in the open

The identity layer is part of tinkerclaw, the open-source agent platform the Building Jarvis series builds toward. Star or follow for updates as the J-series implementations ship.

⭐ github.com/globalcaos/tinkerclaw

Read the paper


First page of the Identity Persistence paper

📄 Read the full paper (PDF) →

18 pages · the full architecture, the compaction loss, the drift-detection math, the stability proof, and the production results

Was this useful?

We’re building these in the open and we want your read on them. Did this land — 👍 or 👎? What would you want the next paper to dig into? Tell us in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *