CostRoot — Synthetic Sample Report

Ranked leaks · each with its fix

#01

$1,519/mo recoverable

Caching left on the table

PROMPT CACHING RAG service

anthropic.claude-sonnet-4-5 · 4,200 calls re-send ~28,400 prefix tokens · cacheRead=0

Enable prompt caching: add a cachePoint after the system/tools block. On subagents, Bedrock disables sub-agent caching by default — turn it on explicitly. Cache reads run ~90% cheaper than re-sending the prefix every call.

#02

$1,116/mo recoverable

Cache expiring between calls

CACHE TTL support-copilot

anthropic.claude-sonnet-4-5 · 899 cold re-writes of a ~72,000-token cached prefix · expired past the 300s TTL

Extend the cache TTL 5m → 1h: a stable prefix is being re-cached — paying the write premium — every time the workload idles past the current 5-minute window. Set a longer cache_control TTL on the stable system/tools block where the model supports it, or keep sessions warm. Net of the longer-TTL write premium.

#03

$458/mo recoverable

Batch-eligible traffic on on-demand

BATCH 50% nightly job

anthropic.claude-sonnet-4 · 2,000 calls · role identifies as a job, regular cadence (cv=0.00) · looks async

Move to Batch inference: submit JSONL to S3 for a flat 50% cut. Confirm this nightly job tolerates ~24h turnaround before switching — it already runs on a fixed schedule.

#04

$197/mo recoverable

Premium model doing cheap work

ROUTE / DOWNSHIFT Opus extractor

anthropic.claude-opus-4-5 · 1,600 calls · median output 80 tok · routing/extraction-shaped

Route or downshift the Opus extractor: enable Intelligent Prompt Routing (~30%, zero code change) as the safe first step. Bigger wins — downshift to Haiku/Nova, or distill (~75%) for this fixed extraction task.