$3,289/mo you can fix
Spend you can fix right now, with the exact change for each. Leaks are detected read-only from your invocation logs;
the figure above counts only fixes you've confirmed your team owns.
Read the number this way — Customer-actionable assumes you've confirmed you own each fix
lever (the owner-mapping step). From logs alone, every finding starts at $0 actionable until you confirm ownership.
Ranked leaks · each with its fix
#01
$1,519/mo recoverable
Caching left on the table
PROMPT CACHING
RAG service
anthropic.claude-sonnet-4-5 · 4,200 calls re-send ~28,400 prefix tokens · cacheRead=0
Enable prompt caching: add a cachePoint after the system/tools block.
On subagents, Bedrock disables sub-agent caching by default — turn it on explicitly. Cache reads run ~90% cheaper than re-sending the prefix every call.
#02
$1,116/mo recoverable
Cache expiring between calls
CACHE TTL
support-copilot
anthropic.claude-sonnet-4-5 · 899 cold re-writes of a ~72,000-token cached prefix · expired past the 300s TTL
Extend the cache TTL 5m → 1h: a stable prefix is being re-cached — paying the write premium — every time the workload idles past the current 5-minute window.
Set a longer cache_control TTL on the stable system/tools block where the model supports it, or keep sessions warm. Net of the longer-TTL write premium.
Batch-eligible traffic on on-demand
BATCH 50%
nightly job
anthropic.claude-sonnet-4 · 2,000 calls · role identifies as a job, regular cadence (cv=0.00) · looks async
Move to Batch inference: submit JSONL to S3 for a flat 50% cut.
Confirm this nightly job tolerates ~24h turnaround before switching — it already runs on a fixed schedule.
Premium model doing cheap work
ROUTE / DOWNSHIFT
Opus extractor
anthropic.claude-opus-4-5 · 1,600 calls · median output 80 tok · routing/extraction-shaped
Route or downshift the Opus extractor: enable Intelligent Prompt Routing
(~30%, zero code change) as the safe first step. Bigger wins — downshift to Haiku/Nova, or distill (~75%) for this fixed extraction task.
The honest counterpoint: run this same scan against an already-tuned workload and Verdict reports
~$0 you can fix — caching on, no churn, right-sized models. We only ever headline spend you can actually recover.