MCP Server  ·  Local  ·  Zero Cloud

Index once.
Save 93%.

A local MCP server for AI coding agents. AST-aware indexing, semantic search, and automatic compression. Your agent stops re-reading your entire codebase every session.

$ uv tool install code-context-engine
~/my-project
$ cce init   Code Context Engine · my-project ────────────────────────────────   Git hooks installed (3 hooks, auto-updates) MCP server registered in .mcp.json CLAUDE.md created .gitignore updated   Indexing project... ██████████████████████████████ 89/89 files 100%   Indexed 1,247 chunks from 89 files   Done! Restart Claude Code to activate CCE.
Works with your editor
AnthropicClaude Code CursorCursor VS CodeVS Code GeminiGemini CLI OpenAICodex CLI OpenCodeOpenCode

See it in action

Install, index, search, and see savings in under 30 seconds.

terminal
CCE Demo: install, index, search, and see token savings

Verified on FastAPI.
Reproduce it yourself.

20 real coding questions against FastAPI (48 source files, 19K lines). No synthetic queries, no cherry-picking.

Retrieval
93%
full files → relevant chunks
Compression
90%
chunks → signatures
Recall@10
0.80
found the right files
Latency p50
0.4ms
per search query
Token flow per query (avg)
Full files
75,355
After retrieval
5,381
After compression
541

Per-Layer Savings

Each layer measured against its own baseline. Not stacked.

Retrieval · measured 93%
Chunk Compression · measured 90%
Output Compression · estimated 65%
Grammar · measured 13%

Reproduce it

$ pip install code-context-engine
$ python benchmarks/run_benchmark.py \
    --repo https://github.com/fastapi/fastapi.git \
    --source-dir fastapi

Full results: benchmarks/results/fastapi.md

Three commands.
Permanent savings.

01
🗂
Index your codebase

Tree-sitter parses your code into semantic chunks (functions, classes, modules). Stored locally with vector embeddings. Git hooks keep the index current automatically.

cce init
02
🔍
Claude searches, not reads

Instead of reading whole files, Claude calls context_search via MCP. Hybrid vector + keyword search finds the relevant chunks. Gets the 800 tokens it needs, not an 8,000-token dump.

context_search "payment"
03
📊
Track real savings

Every query is recorded. cce savings shows exactly how many tokens CCE saved you, with dollar estimates from live Anthropic pricing.

cce savings

Everything Claude needs.
Nothing it doesn't.

🌳
AST-Aware Chunking

Tree-sitter parses Python, JavaScript, TypeScript, PHP, Go, Rust, and Java into semantic chunks. Functions, classes, imports. No raw file dumps, no context waste.

Hybrid Search + Graph Expansion

Vector similarity + BM25 keyword search merged via Reciprocal Rank Fusion. Then CCE walks one hop on the code graph — if auth.py is a hit, utils.py it imports comes too.

🧠
Smart Compression

With Ollama running locally, chunks are summarized by phi3:mini. Without it, smart truncation extracts signatures and docstrings. Four output levels: off, lite, standard, max.

🔗
Session Memory

Recall decisions and code areas across sessions via session_recall, record_decision, and record_code_area. No re-explaining your architecture every session.

🛡
Security by Default

Secret files (.env, *.pem) are never indexed. Content is scanned for API keys and credentials. PII is scrubbed from memory writes. Path traversal protection on all inputs.

📊
Web Dashboard

Run cce dashboard for donut charts, bar graphs, file health, and live 5-second polling. Or use cce savings for a quick terminal summary.

How CCE stacks up

Feature No tool Caveman CCE (default) CCE + Ollama
Compress output tokens
Compress input tokens
Codebase indexing
Session memory
LLM summarization
Cost per session (Opus 4, medium project) $2.25 $1.28 $0.68 $0.45

Stop paying to
re-read code.

One install. Automatic savings. Everything local.

$ uv tool install code-context-engine
1 uv install
2 cce init
3 Restart Claude Code
Saving tokens