You know that feeling when your AI coding agent burns through your API budget reading entire files just to find one function? Agents using grep or ripgrep by default pull in whole files, sending tens of thousands of tokens for every lookup. A single question about your authentication flow can trigger 45,692 tokens of context — most of it irrelevant code on surrounding lines. There is no built-in way to give an agent a chunk-level, semantically-aware code index that runs locally, needs no API keys, and answers in milliseconds.
When you run a Semble search, it parses your codebase with tree-sitter — a grammar-aware parser that splits code at real boundaries like function and class definitions, not arbitrary line counts. Each chunk gets two representations: a 256-dimensional embedding from `potion-code-16M` (which averages pre-computed token embeddings without running attention, like averaging word meanings instead of reading the whole sentence in order) and a BM25 lexical index. At query time, both retrievers rank candidates independently; reciprocal rank fusion merges the two lists; then a post-fusion re-ranker boosts definition matches and penalizes noise. The full round-trip runs in ~1.5 ms on a warm index — no GPU, no network call.
If you use Claude Code, Cursor, or Codex on repositories over 10,000 lines and your API costs are climbing from agent file-reads, Semble targets that specific problem. It is also worth exploring if you are building MCP-based agent infrastructure and need a fast, local retrieval primitive. Not useful yet if you need stable Windows support, production SLAs, or reliable retrieval across languages outside Python, Java, JavaScript, Go, PHP, and Ruby — the underlying model was trained only on those six.
Worth a weekend experiment if you actively use Claude Code or Cursor on a mid-to-large repo and want to measure context costs — the 1.5 ms query latency and CPU-only operation are verified advantages. Hold off on production adoption: the benchmark was designed and judged by Claude Sonnet 4.6 (the same model in the target workflow), independent validation is absent, and at least one real-world test found higher total API cost despite lower context due to tool-call overhead. At v0.1.8 and 18 days old, the project is promising but not stable.
Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.