R&D intermediate 2 min read Mar 16, 2026 · Updated Mar 19, 2026
Public Preview Sign in free for the full digest →

DoorDash's semantic search uses 32M labels to match queries with products

“Off-the-shelf CLIP failed on DoorDash's e-commerce queries. They built their own model with 32M labels and deployed it to 100% of traffic.”

DoorDash's semantic search uses 32M labels to match queries with products
8 Views
1 Likes
0 Bookmarks
Source · infoq.com

“Off-the-shelf models lack the specificity of the e-commerce domain and frequently fail when used on short but specific queries. — DoorDash ML Team, DashCLIP paper”

You know that feeling when a customer searches 'healthy snack' and your search returns nothing because your products are labeled 'organic granola bar'? Traditional search relies on keyword matching and engagement history — it can't understand that a photo of chips and the word 'crunchy' describe the same thing. Off-the-shelf vision-language models like CLIP work great on general images but fail on e-commerce because they don't understand product categories, aisle layouts, or shopping intent. Before DashCLIP: you'd get zero results for semantically relevant queries. After: the system retrieves the right products even when the words don't match.

semantic-searchmultimodal-mle-commerceembeddingscontrastive-learningvision-languageinformation-retrieval

Think of DashCLIP like a universal translator that converts product images, product text, and search queries into the same language — vectors. Stage 1: Take a pretrained vision-language model (BLIP-14M) and continue training it on 400K DoorDash product images and titles. This teaches the model what grocery products look like. Stage 2: Train a separate query encoder that maps user searches into the same vector space as products. The key innovation is the Query-Catalog Contrastive loss — it pulls relevant query-product pairs closer together while pushing irrelevant pairs apart. You use 700K human labels to fine-tune GPT, which then generates 32M labeled pairs for training. At inference, encode the query, find nearest product vectors, and rank.

01
Two-stage training pipeline — why YOU care: Stage 1 adapts generic models to your domain (400K products), Stage 2 aligns queries with products, giving you embeddings that actually understand your specific catalog
02
Query-Catalog Contrastive (QCC) loss — why YOU care: Custom loss function designed for e-commerce that outperforms generic contrastive learning, giving you better retrieval accuracy on short, specific queries
03
LLM-augmented labeling — why YOU care: Start with 700K human labels, use GPT to expand to 32M — eliminates position/selection bias from engagement data while keeping labeling costs manageable
04
Multimodal product representation — why YOU care: Combines image encoder, text encoder, and image-grounded text encoder into one representation, so products with poor descriptions but good images still get found
05
Generalizable embeddings — why YOU care: Same embeddings work for retrieval, ranking, aisle categorization, and relevance prediction — one model serves multiple downstream tasks
06
Production-proven at scale — why YOU care: Deployed to 100% of DoorDash sponsored product traffic with statistically significant improvements in CTR and conversion rate
Who it’s for

If you're building or improving search/recommendations for an e-commerce platform, marketplace, or any product catalog — this shows you how to move beyond keyword matching to semantic understanding. Also relevant if you're evaluating whether to fine-tune off-the-shelf vision-language models vs build from scratch. Not useful if you don't have a product catalog or if your search problem is purely text-based without visual content.

Worth exploring

The architecture is production-proven and the paper provides enough detail to replicate. The key insight — that off-the-shelf models fail on domain-specific e-commerce queries — is broadly applicable. The LLM-augmented labeling approach (700K human → 32M GPT-generated) is a practical pattern you can borrow. The one caveat: DoorDash has significant ML infrastructure; you'll need to adapt this to your scale. If you're doing e-commerce search, this is worth studying closely.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →