Handcrafted wooden abacus with glowing LED tokens

How to Track AI Token Costs in Real Time

Posted by:

Oscar Serra

On:

March 22, 2026

Your $200 AI session happened because you couldn’t see it happening.

Claude Opus: $15 per million input, $75 per million output. GPT-4: roughly the same ballpark. A deep coding session with a large context window can quietly cost more than your lunch. The problem isn’t the pricing — it’s the zero visibility.

What Token Tracking Actually Means

Token tracking isn’t just logging how many tokens you used after the fact. Real-time tracking means seeing, while the conversation is happening:

Context window composition — How much of your input is system prompt vs conversation history vs actual task?
Response cost breakdown — Which model responded, how many tokens, what did that specific response cost?
Per-provider tracking — When you switch between Claude and GPT mid-session, where did the money go?
Budget alerts — Know before you hit your limit, not after.

How TinkerClaw Does It

Tinker UI — our real-time dashboard — shows token data as treemaps. You see your context window as colored blocks: system prompt in one color, conversation history in another, tool results in a third. The proportions shift with every message.

The response treemap does the same for output: model used, token count, cost, latency. All live. No refreshing, no waiting for a billing dashboard the next morning.

Practical Cost Reduction Strategies

Cut your system prompt. We went from 23.5KB to 12KB without losing capability. Every byte saved is money saved on every message.
Route by task. Heartbeats on Haiku ($0.25/M), not Opus ($15/M). That’s a 60x cost difference for the same result.
Set budget pressure thresholds. Above 85% weekly spend, cap at Sonnet. Above 95%, drop to Haiku. Automatic, no surprises.
Use flat-rate models for automated work. Cron jobs, background tasks, self-improvement loops — all flat-rate. Metered models only for interactive sessions.