Skip to content

Introduction

Code Context Engine (CCE) is a local MCP server that indexes your codebase so AI coding agents search for relevant code instead of reading entire files.

Every time an AI agent needs to understand your code, it reads entire files. A 500-line file costs 500 lines of input tokens even when the agent only needs one function. Across a session, this adds up to thousands of wasted tokens and real dollars.

CCE parses your code into semantic chunks (functions, classes, modules) using Tree-sitter, stores them with vector embeddings, and serves only the relevant pieces when the agent asks a question.

Result: 94% input token savings, reproducibly benchmarked.

ToolPurpose
context_searchHybrid vector + keyword search with graph expansion
get_chunkRetrieve a specific chunk by ID
record_decisionStore architectural decisions for cross-session recall
record_code_areaMark areas you’ve worked on
session_recallRecall decisions and code areas
session_timelineBrowse tool call history
session_eventInspect a specific past event
set_output_levelControl output compression (off/lite/standard/max)
set_scopeLimit search to specific directories
EditorConfig writtenInstructions
Claude Code.mcp.jsonCLAUDE.md
VS Code / Copilot.vscode/mcp.json.github/copilot-instructions.md
Cursor.cursor/mcp.json.cursorrules
Gemini CLI.gemini/settings.jsonGEMINI.md
OpenAI Codex~/.codex/config.tomlAGENTS.md
OpenCodeopencode.json
Tabnine.tabnine/agent/settings.jsonTABNINE.md
  1. Index — Tree-sitter parses code into semantic chunks. Stored locally with vector embeddings.
  2. Search — Agent calls context_search via MCP. Hybrid vector + BM25 merged with Reciprocal Rank Fusion. Graph expansion adds related imports.
  3. Compress — Chunks are compressed (truncation or LLM summary with Ollama). Session-wide output compression rules in instruction files reduce reply tokens (diff-only code, no filler).
  4. Track — Every query recorded. cce savings shows tokens and dollars saved.