GitHub Repos beginner 3 min read May 19, 2026 · Updated May 20, 2026
Public Preview Sign in free for the full digest →

Semble: A semantic code search engine for AI agents

“A 16M-parameter model loses only 0.008 NDCG@10 against a 137M-parameter transformer — and indexes 218× faster. On a laptop. No GPU.”

Semble: A semantic code search engine for AI agents
2 Views
0 Likes
0 Bookmarks
Source · github.com

You know that feeling when your AI coding agent burns through your API budget reading entire files just to find one function? Agents using grep or ripgrep by default pull in whole files, sending tens of thousands of tokens for every lookup. A single question about your authentication flow can trigger 45,692 tokens of context — most of it irrelevant code on surrounding lines. There is no built-in way to give an agent a chunk-level, semantically-aware code index that runs locally, needs no API keys, and answers in milliseconds.

code-searchembeddingsmcpmcp-serveragentsdevtoolspython

When you run a Semble search, it parses your codebase with tree-sitter — a grammar-aware parser that splits code at real boundaries like function and class definitions, not arbitrary line counts. Each chunk gets two representations: a 256-dimensional embedding from `potion-code-16M` (which averages pre-computed token embeddings without running attention, like averaging word meanings instead of reading the whole sentence in order) and a BM25 lexical index. At query time, both retrievers rank candidates independently; reciprocal rank fusion merges the two lists; then a post-fusion re-ranker boosts definition matches and penalizes noise. The full round-trip runs in ~1.5 ms on a warm index — no GPU, no network call.

01
CPU-only operation: no GPU needed because the 16M-parameter model pre-computes token embeddings at build time and averages them at query time — no attention pass runs during search, so a laptop handles it
02
Tree-sitter code-aware chunking: splits code at function and class boundaries so each retrieved chunk is a complete, compilable unit you can actually use, not an arbitrary 512-token slice
03
Dual retrieval with reciprocal rank fusion: runs semantic (embedding) and lexical (BM25) search in parallel and merges the ranked lists, so exact identifier matches and conceptual queries both surface accurate results
04
MCP server with session-cached indexes: integrates into Claude Code, Cursor, Codex, and OpenCode via the Model Context Protocol; the index stays warm across agent turns so repeated queries within a session skip re-indexing
05
GitHub URL and local path support: you can index a repo you have not cloned yet by passing a GitHub URL, not just projects already on disk
06
Reproducible benchmark: all 63 benchmark repos are pinned by revision in repos.json, so you can verify the 0.854 NDCG@10 claim yourself without trusting the authors
Who it’s for

If you use Claude Code, Cursor, or Codex on repositories over 10,000 lines and your API costs are climbing from agent file-reads, Semble targets that specific problem. It is also worth exploring if you are building MCP-based agent infrastructure and need a fast, local retrieval primitive. Not useful yet if you need stable Windows support, production SLAs, or reliable retrieval across languages outside Python, Java, JavaScript, Go, PHP, and Ruby — the underlying model was trained only on those six.

Worth exploring

Worth a weekend experiment if you actively use Claude Code or Cursor on a mid-to-large repo and want to measure context costs — the 1.5 ms query latency and CPU-only operation are verified advantages. Hold off on production adoption: the benchmark was designed and judged by Claude Sonnet 4.6 (the same model in the target workflow), independent validation is absent, and at least one real-world test found higher total API cost despite lower context due to tool-call overhead. At v0.1.8 and 18 days old, the project is promising but not stable.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →