R&D intermediate 3 min read May 27, 2026
Public Preview Sign in free for the full digest →

Carbon: HuggingFace's DNA Model That Runs on One GPU

“Evo2-7B needs 8 GPUs to analyze long DNA. Carbon-3B does most of the same tasks on one GPU at 100,000 base pairs per second — and the weights are Apache 2.0.”

Carbon: HuggingFace's DNA Model That Runs on One GPU
1 Views
0 Likes
0 Bookmarks
Source · huggingface.co

“'The ll_correct metric is inflated since negative examples contain 24 bp mismatches, creating artificially large likelihood gaps. The true headline metric is gen_exact_match.' — Carbon evaluation README (github.com/huggingface/carbon/blob/main/evaluation/README.md)”

You want to score thousands of disease variants against a reference genome, generate synthetic coding sequences, or run a genomic needle-in-a-haystack retrieval — but Evo2-7B's 40B parameters require multi-GPU sharding that your lab's H100 budget doesn't cover. GENERator-v2 caps effective causal retrieval at ~16k tokens despite its 1M bp context claim. Character-level models like DNABERT-2 are encoder-only and can't generate sequences. You end up either renting cloud compute at $20+/hour or using under-powered models that lose on key benchmarks.

dnagenomicsbioinformaticsopen-sourcetransformerllmhuggingface

Carbon treats DNA like compressed text: instead of reading one nucleotide (A, C, G, T) at a time, it reads 6 at once — one 6-mer token. That 6x compression means a 197,000 base-pair sequence fits in 32,768 tokens, making attention feasible on a single GPU. The catch is that 6-mer tokenization loses per-nucleotide resolution during prediction: 'ATGCGC' either matches or doesn't. Factorized Nucleotide Supervision (FNS) fixes this by factoring each 6-mer prediction into six independent per-position probability distributions during training, so gradients flow at single-nucleotide granularity. At inference, you wrap any DNA sequence in `<dna>...</dna>` tags (without these, the model treats DNA as English and performance collapses), optionally prefix species and gene-type metadata tokens, and run standard autoregressive generation. For variant effect prediction, you score the same sequence twice — with and without the mutation — and take the log-likelihood delta.

01
Single-GPU inference for 3B model — you run variant scoring on one H100 at >100k bp/s instead of provisioning a multi-GPU cluster, cutting infrastructure cost for a standard lab workload
02
FNS checkpoint (revision='fns') — exposes per-position nucleotide probabilities at inference so you get single-base-resolution scoring without the compute overhead of character-level models
03
Metadata-conditioned generation — prefix any sequence with species type and gene type tokens (e.g. `<vertebrate_mammalian><protein_coding_region>`) to steer generation toward species-specific sequence patterns
04
Carbon-500M draft model for speculative decoding — purpose-built small model accelerates Carbon-3B/8B generation by proposing tokens in parallel, reducing wall-clock time for long continuations
05
GGUF variants for all three model sizes — Carbon-500M-GGUF, Carbon-3B-GGUF, Carbon-8B-GGUF let you run inference via llama.cpp without a CUDA environment
06
YaRN context extension to 393kbp — at inference time (factor=4.0), native 32k context stretches to 65k tokens, closing the NIAH gap from 0.55 to 0.90 at 32k without retraining
07
Apache 2.0 license on all weights — no usage restrictions, no API gating, no need to agree to terms that block commercial derivative models
Who it’s for

If you work in computational biology, bioinformatics, or genomics research and want to run zero-shot variant effect prediction, sequence generation, or embedding-based analysis on eukaryotic genomes, Carbon is built for your workload. It is also relevant if you are building a bioinformatics SaaS product and need a DNA foundation model you can commercially deploy without licensing friction. Not useful yet if your research focuses on bacteria, archaea, or phage biology — the model's prokaryotic performance only matches GENERator-v2-prokaryote-3B rather than beating it, and 85% of training data ...

Worth exploring

Worth exploring if you have a single H100 and need a production-capable DNA generation or VEP pipeline right now — the Apache 2.0 license, GGUF variants, and vLLM support lower the barrier to deployment substantially for a one-week-old release. Be cautious about the long-context retrieval story: native 32k NIAH scores 0.55 (vs Evo2's 0.95), and the model's own eval README flags the `ll_correct` metric as inflated — always use `gen_exact_match` for honest benchmarking. The throughput claim (150× vs 250× vs 275× depending on the source) is not backed by a single reproducible table, so benchmark your specific workload before committing.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →