Skip to content

Configuration

CCE works with zero configuration out of the box. This page covers all available options for when you need to tune behavior.

  • Global: ~/.cce/config.yaml (created automatically on first use)
  • Per-project: .context-engine.yaml in your project root (overrides global for that project)
compression:
level: standard # How much to compress code chunks before sending to the agent
# Options: minimal | standard | full
output: standard # How much to compress agent responses
# Options: off | lite | standard | max
model: phi3:mini # Ollama model for LLM-based summarization
# Auto-detected if Ollama is running. Ignored if Ollama is off.
indexer:
watch: true # Keep index in sync via git hooks
ignore: # Directories and patterns to skip during indexing
- .git
- node_modules
- __pycache__
- .venv
- dist
- build
retrieval:
top_k: 20 # Maximum chunks returned per query
confidence_threshold: 0.5 # Minimum score to include a result (0.0 to 1.0)
embedding:
model: BAAI/bge-small-en-v1.5 # Embedding model (fastembed-compatible)
pricing:
model: opus # Model for cost estimates in `cce savings`
# Options: opus | sonnet | haiku

Controls how much CCE compresses code chunks before including them in the agent’s context.

LevelBehavior
minimalTruncation only. Keeps signature and docstring, drops body.
standardTruncation plus light summarization if Ollama is available.
fullFull LLM summarization via Ollama (requires Ollama running).

Controls how verbose the agent’s responses are. During cce init, the configured level is written into instruction files (CLAUDE.md, AGENTS.md, .cursorrules, etc.) so it applies to the entire session, not just CCE tool responses.

LevelStyleTypical savings
offFull output0%
liteNo filler/hedging, diff-only code~25%
standardFragments, short synonyms, diff-only code~70%
maxTelegraphic, abbreviations, diff-only code~80%

All levels include code output rules: show only changed lines, never rewrite entire files, never echo back unchanged code. Code blocks, paths, commands, and error messages are never compressed. Security warnings use full clarity.

Change the level and re-run cce init to update instruction files, or change at runtime:

set_output_level output_level=max
embedding:
model: sentence-transformers/all-mpnet-base-v2

Any model available in fastembed works. Changing the model requires a full re-index:

Terminal window
cce clear --yes && cce index --full

The default BAAI/bge-small-en-v1.5 is recommended for most use cases. It balances quality, speed, and size well.

top_k controls how many chunks the retriever returns per query. Higher values surface more context but cost more tokens. Default: 20.

confidence_threshold sets the minimum score to include a result. Range 0.0 to 1.0. Lower values return more results; higher values return only strong matches. Default: 0.5.

At runtime, the agent can pass top_k and max_tokens directly to context_search:

context_search(query="payment processing", top_k=5, max_tokens=3000)

The indexer.ignore list supports:

  • Directory names: node_modules, dist
  • File patterns: "*.generated.ts", "*.min.js"
  • Relative paths: "src/legacy/"

Files matching .gitignore are also skipped automatically.

pricing:
model: sonnet # opus (default) | sonnet | haiku

This determines which model’s pricing is used for cost estimates in cce savings. Prices are fetched from Anthropic’s docs and cached for 7 days.

If Ollama is running on a non-default address, set it via environment variable:

Terminal window
export OLLAMA_HOST=http://localhost:11434

CCE auto-detects available RAM and adjusts behavior:

RAMProfileBehavior
Less than 12 GBlightTruncation only, small embedding batches
12 to 32 GBstandardFull pipeline, standard batch sizes
More than 32 GBfullLarger Ollama models, larger batches

You do not need to set this manually.

  • All data stays local. No code is sent to external services (unless you use a cloud embedding model).
  • Index data is stored in ~/.cce/projects/.
  • The MCP server only listens on stdio (not network) when launched by an agent.