PREFRONTAL: Giving Your Agent an Executive Function with a Recipe Execution Substrate

Your model is a brilliant worker and a terrible executive. Left alone it edits before it reproduces, codes before it designs, and declares victory before it verifies. PREFRONTAL is the missing executive layer: a recipe execution substrate that, on every turn, recognizes the task, picks or assembles the right playbook, runs your agent inside it, watches the work, and spends effort in proportion to how hard the turn actually is. Part of our open Building Jarvis series.

📄 Read the full paper (PDF) →

Abstract — v4.7, 14 June 2026

The dorsolateral prefrontal cortex does not do tasks. It holds the plan, selects the right behavioural program for the moment, sequences sub-routines, suppresses the impulsive response, watches whether the program is working, and re-allocates effort when it is not. It is the brain’s executive: not a worker, but the structure that makes work coherent. We argue that a deployed language agent needs the same thing, and that the right shape for it is neither a guard rail nor a manager-agent but a recipe execution substrate — a layer that turns each turn into an executive-function cognitive cycle of plan → match/compose/author → execute-with-loops → observe → adapt-effort.

A recipe is a structured workflow: ordered steps, tool hints, success gates, and failure handlers, written as a portable document rather than as code. The substrate does six things with recipes that together make them an execution engine rather than a static library. (1) It matches the user’s intent to recipes with a fuzzy, stemmed, edit-distance-tolerant scorer that reports a confidence tier — and that admits negative as well as positive evidence, so a recipe can declare the prompts for which it must not be chosen — and seeds a plan automatically at the start of every turn. (2) It composes recipes from recipes — a recipe may merge another recipe’s steps into its own at plan-build time, or call a sub-recipe at runtime, both cycle- and depth-guarded, exactly as functions call functions. (3) When nothing matches, it authors a new recipe on the fly, validates it, persists it, and makes it matchable on the very next turn — closing the loop on its own coverage gaps. (4) It supports bounded loops over steps (run N times / until-dry / until a marker appears), the element that lifts recipes from straight-line checklists to a structured execution substrate and closes the long-standing structural gap against hand-coded agent workflows. (5) It routes effort dynamically, classifying each turn as trivial / standard / deep / ultra and using that classification to drive the turn — model tier, thinking budget, orchestration mode, and token generosity all scale to the work. (6) It makes all of this observable through a dedicated RECIPES panel that renders the provenance trail (searched → matched → merged → composed → authored), per-subagent vitals, and the compact-versus-expanded decision trail, while keeping orchestration mechanics out of the substantive chat.

The thesis is that intelligence in a deployed agent is not primarily a property of the model. It is a property of the structure the model executes inside — and that structure is most powerful when it is composable, self-authoring, looping, effort-adaptive, and observable. This paper specifies that substrate, grounds the brain metaphor in concrete mechanism, gives language-agnostic pseudocode for each of the six capabilities, and reports which of the substrate’s harder extensions — durable resume, a self-improving library, external acquisition, and a versioned marketplace — are implemented and running, and which one — searched execution — remains specified but unbuilt. Throughout, we position the substrate against the fresh open-source ecosystem it shares a problem space with — chopratejas/headroom’s reversible context-compression and addyosmani/agent-skills’ anti-trigger and load-constraint discipline — and fold their stronger ideas into the substrate’s own seams where they sharpen a claim.

How it works, in one minute

It seeds a plan before your agent’s first token. A local, in-memory fuzzy matcher (stemmed, prefix- and edit-distance-tolerant, weighted by where the match lands) scores the prompt against every recipe’s metadata in a few hundred microseconds — no model call, no network — so the right playbook is already loaded when work starts.
Recipes call recipes — and typed library primitives. A recipe can merge another’s steps at build time (composes:), call a sub-recipe at runtime (uses:), or invoke a vetted stdlib routine (invoke skill:) with a declared output schema that is validated before the result is persisted — all cycle- and depth-guarded. Workflows reuse workflows the way functions call functions.
When nothing fits, it writes a new recipe mid-conversation. The substrate authors a candidate — by wiring existing skill primitives into a runnable plan, or by synthesis — validates it against a schema so a corrupt entry can’t poison the library, persists it under a never-overwrite rule, and makes it matchable on the very next turn. Coverage gaps heal themselves.
Bounded loops turn checklists into programs. A step can repeat count N times, until-dry (the worker signals “nothing new” in plain prose), or until a named marker appears — every loop clamped to 5 by default and a hard ceiling of 25, so it is expressive enough to iterate and constitutionally incapable of spinning forever.
Effort scales to the turn — and is enforced, not advisory. Each turn is classified trivial → standard → deep → ultra from cheap language signals. That tier drives model size, thinking budget, orchestration mode (solo, parallel fan-out, or full workflow with adversarial verification), and token generosity. A per-spawn budget watchdog makes this binding: the worker doesn’t get to ignore the effort lever.
A fresh-context critic doubts the “done.” Ultra-tier turns end with a doubt-driven verification pass that strips the producing reasoning and reviews the artifact cold — structurally harder to fool than asking the same context to grade its own work. Borrowed from addyosmani/agent-skills’ doubt-driven-development discipline, now a native uses: doubt-driven-verify sub-recipe.

Composable, not competing: how this fits the OSS ecosystem

PREFRONTAL shares a problem space with two notable open-source projects. Both are treated as composable pieces the substrate folds in, not as alternatives to route around.

🗜 chopratejas/headroom (~24.7k ★)

Reversible Compress-Cache-Retrieve: compresses context artifacts, caches the verbatim original, retrieves it on demand. PREFRONTAL folds CCR into its durable artifact carry-forward spine — making plan resume lossless rather than truncated, with content-type-aware compressors (AST for code, array-crushing for JSON, summary for prose).

🧠 addyosmani/agent-skills (~56.8k ★)

Anti-trigger + load-constraint + doubt-driven-development: skills declare what they must not match and where they may safely load. PREFRONTAL adopts both natively — anti-triggers feed the recipe matcher’s negative-evidence path (so a look-alike prompt can veto a wrong recipe); load constraints gate step fan-out to safe contexts; and doubt-driven-development becomes the substrate’s native fresh-context completion critic.

🔧 Building Jarvis — open source

The substrate, the skills, and the surrounding cognitive stack are built in the open. Follow along or contribute.

⭐ github.com/globalcaos/tinkerclaw →

Read the paper

📄 Read the full paper (PDF, v4.6) →

23 pages · the full substrate, language-agnostic pseudocode for all six capabilities, the executive-function cycle, and what has shipped vs. what remains open

Was this useful?

We’re building these in the open and we want your read on them. Did this land — 👍 or 👎? What would you want the next paper to dig into? Tell us in the comments below.

More from Building Jarvis

See everything in Building Jarvis →