Semble: A semantic code search engine for AI agents

What problem does it solve

You know that feeling when your AI coding agent burns through your API budget reading entire files just to find one function? Agents using grep or ripgrep by default pull in whole files, sending tens of thousands of tokens for every lookup. A single question about your authentication flow can trigger 45,692 tokens of context — most of it irrelevant code on surrounding lines. There is no built-in way to give an agent a chunk-level, semantically-aware code index that runs locally, needs no API keys, and answers in milliseconds.

code-searchembeddingsmcpmcp-serveragentsdevtoolspython

How it works

When you run a Semble search, it parses your codebase with tree-sitter — a grammar-aware parser that splits code at real boundaries like function and class definitions, not arbitrary line counts. Each chunk gets two representations: a 256-dimensional embedding from `potion-code-16M` (which averages pre-computed token embeddings without running attention, like averaging word meanings instead of reading the whole sentence in order) and a BM25 lexical index. At query time, both retrievers rank candidates independently; reciprocal rank fusion merges the two lists; then a post-fusion re-ranker boosts definition matches and penalizes noise. The full round-trip runs in ~1.5 ms on a warm index — no GPU, no network call.

Key takeaways

✦

01

CPU-only operation: no GPU needed because the 16M-parameter model pre-computes token embeddings at build time and averages them at query time — no attention pass runs during search, so a laptop handles it

⟁

02

Tree-sitter code-aware chunking: splits code at function and class boundaries so each retrieved chunk is a complete, compilable unit you can actually use, not an arbitrary 512-token slice

⊕

03

Dual retrieval with reciprocal rank fusion: runs semantic (embedding) and lexical (BM25) search in parallel and merges the ranked lists, so exact identifier matches and conceptual queries both surface accurate results

◈

04

MCP server with session-cached indexes: integrates into Claude Code, Cursor, Codex, and OpenCode via the Model Context Protocol; the index stays warm across agent turns so repeated queries within a session skip re-indexing

∞

05

GitHub URL and local path support: you can index a repo you have not cloned yet by passing a GitHub URL, not just projects already on disk

◎

06

Reproducible benchmark: all 63 benchmark repos are pinned by revision in repos.json, so you can verify the 0.854 NDCG@10 claim yourself without trusting the authors

Should you care?

Who it’s for

If you use Claude Code, Cursor, or Codex on repositories over 10,000 lines and your API costs are climbing from agent file-reads, Semble targets that specific problem. It is also worth exploring if you are building MCP-based agent infrastructure and need a fast, local retrieval primitive. Not useful yet if you need stable Windows support, production SLAs, or reliable retrieval across languages outside Python, Java, JavaScript, Go, PHP, and Ruby — the underlying model was trained only on those six.

Worth exploring

Worth a weekend experiment if you actively use Claude Code or Cursor on a mid-to-large repo and want to measure context costs — the 1.5 ms query latency and CPU-only operation are verified advantages. Hold off on production adoption: the benchmark was designed and judged by Claude Sonnet 4.6 (the same model in the target workflow), independent validation is absent, and at least one real-world test found higher total API cost despite lower context due to tool-call overhead. At v0.1.8 and 18 days old, the project is promising but not stable.

6 more sections · unlock free

Developer playbook

Tech stack, code snippet, sentiment, alternatives.

PM playbook

Adoption angles, user fit, positioning.

CEO playbook

Traction signals, ROI, build vs buy.

Deep-dive insight

Full long-form analysis, no fluff.

Easy mode

Core idea, fast — when you need the gist.

Pro mode

Technical nuance, edge cases, tradeoffs.

Sign in free — unlock all 6

Semble: A semantic code search engine for AI agents

Underrated tools. Unfiltered takes.