Your $200 AI session happened because you couldn’t see it happening.
Claude Opus: $15 per million input, $75 per million output. GPT-4: roughly the same ballpark. A deep coding session with a large context window can quietly cost more than your lunch. The problem isn’t the pricing — it’s the zero visibility.
What Token Tracking Actually Means
Token tracking isn’t just logging how many tokens you used after the fact. Real-time tracking means seeing, while the conversation is happening:
- Context window composition — How much of your input is system prompt vs conversation history vs actual task?
- Response cost breakdown — Which model responded, how many tokens, what did that specific response cost?
- Per-provider tracking — When you switch between Claude and GPT mid-session, where did the money go?
- Budget alerts — Know before you hit your limit, not after.
How TinkerClaw Does It
Tinker UI — our real-time dashboard — shows token data as treemaps. You see your context window as colored blocks: system prompt in one color, conversation history in another, tool results in a third. The proportions shift with every message.
The response treemap does the same for output: model used, token count, cost, latency. All live. No refreshing, no waiting for a billing dashboard the next morning.
Practical Cost Reduction Strategies
- Cut your system prompt. We went from 23.5KB to 12KB without losing capability. Every byte saved is money saved on every message.
- Route by task. Heartbeats on Haiku ($0.25/M), not Opus ($15/M). That’s a 60x cost difference for the same result.
- Set budget pressure thresholds. Above 85% weekly spend, cap at Sonnet. Above 95%, drop to Haiku. Automatic, no surprises.
- Use flat-rate models for automated work. Cron jobs, background tasks, self-improvement loops — all flat-rate. Metered models only for interactive sessions.
The Numbers
After implementing real-time tracking and budget-aware routing:
- Average daily cost dropped by roughly 60%
- Zero surprise bills (budget alerts fire before limits hit)
- Same agent capability — just smarter about which model handles what
Stop guessing what your AI costs. Start seeing it.
