How to Reduce AI Coding Agent Costs
Claude Code, Cursor, and Copilot bills add up. Here's where the money goes and how to cut it.
Where your money actually goes
If you're using Claude Code on the API (not the subscription), your bill breaks down like this:
| Component | % of bill | Why |
|---|---|---|
| Input tokens | 85-95% | Claude reading your code files for context |
| Output tokens | 5-15% | Claude's responses |
The expensive part isn't Claude's answers. It's Claude reading your code. Every question triggers full-file reads. A medium project session costs $0.14 on Sonnet or $0.70 on Opus, mostly from input.
5 ways to reduce costs
1. Use a cheaper model for routine tasks
Opus ($15/1M input) is 5x more expensive than Sonnet ($3/1M input). Use Sonnet for most tasks, switch to Opus only for complex reasoning. Haiku ($0.25/1M) works for simple code generation.
2. Reduce input tokens (biggest impact)
This is where Code Context Engine fits in. Instead of Claude reading entire files, CCE indexes your codebase and returns only the relevant functions. 94% fewer input tokens, benchmarked.
uv tool install code-context-engine cd your-project cce init
Before CCE: 83,681 tokens per query. After: 4,927 tokens. Same answer quality.
3. Compress output tokens
Tools like output compression reduce Claude's reply length. CCE includes this built in (4 levels: off, lite, standard, max). Saves 30-75% on output tokens. Smaller impact than input savings but still meaningful.
4. Use prompt caching
If you're using the Claude API directly, enable prompt caching. Repeated context (system prompts, file contents) gets cached and costs 90% less on subsequent reads. This is automatic in Claude Code.
5. Be specific in your prompts
Vague prompts ("fix the auth") make Claude read more files to understand what you mean. Specific prompts ("fix the JWT expiry check in auth/tokens.py line 42") read fewer files. Less context = lower cost.
Impact comparison
| Technique | Token savings | Effort |
|---|---|---|
| Cheaper model | 3-5x cost reduction | One setting change |
| Code indexing (CCE) | 94% input reduction | One command |
| Output compression | 30-75% output reduction | One setting |
| Prompt caching | 90% on repeated context | Automatic |
| Better prompts | Varies | Habit change |
What this looks like in practice
A developer on Sonnet spending $0.14/session:
| Optimization | Session cost |
|---|---|
| No optimization | $0.14 |
| + CCE (94% input reduction) | $0.04 |
| + Output compression | $0.03 |
That's 5-10 sessions per day without thinking about costs.
Start with the biggest lever
Input tokens are 85-95% of your bill. Cut them first.
uv tool install code-context-engine && cce initView on GitHub