GitHub Repos intermediate 3 min read May 13, 2026
Public Preview Sign in free for the full digest →

LLMs from Scratch: Build a GPT in Pure PyTorch, No LLM Libraries

“A Mac Mini M4 CPU with KV cache (130–224 tok/sec) outran an A100 GPU (26–99 tok/sec) in a 500-line pure PyTorch Gemma 3 implementation — and 57 HN engineers immediately asked why the GPU lost.”

LLMs from Scratch: Build a GPT in Pure PyTorch, No LLM Libraries
Source · github.com

“"at 500 lines long, it's much, much, much more digestable than a lot of the vomit that comes out of so-called production systems" — mdaniel, Hacker News (https://news.ycombinator.com/item?id=44962059)”

You know that feeling when you copy a HuggingFace tutorial, get it working, and still have no idea why the attention mask looks the way it does or what the loss curve is actually telling you? Fine-tuning a model without understanding its internals means every unexpected behavior is a mystery you can't debug. You read blog posts explaining transformers with colored boxes and animations, but translating that into inspectable, debuggable code is a different skill entirely. This repo closes that gap: every building block is written in plain PyTorch so you can step through it in a debugger and see the numbers change.

llmpytorcheducationtransformersgptdeep-learningopen-source

You start with raw text, implement a tokenizer that splits it into numeric tokens, then build multi-head self-attention from scratch using PyTorch matrix operations — no imported attention layer, just tensor math. Once attention works, you stack it into a GPT architecture identical in structure to GPT-2. Chapter 5 walks you through a pretraining loop on unlabeled data; Chapters 6 and 7 cover classification and instruction fine-tuning on the same codebase. Each notebook is self-contained so you can run any chapter independently and inspect intermediate tensor shapes. The repo also loads pretrained GPT-2 weights from OpenAI, so you can verify your architecture matches a real model by comparing outputs.

01
Zero external LLM libraries — every transformer component (tokenizer, attention, positional embeddings, training loop, sampling, LoRA) is written inline in the notebook so you read the exact computation rather than a wrapper function name
02
7 structured chapters plus 5 appendices following the Manning book's narrative — the chapter order matches a printed curriculum you can return to, unlike a video series that may reorder or disappear
03
Loads real pretrained GPT-2 weights — you can verify your from-scratch implementation matches OpenAI's architecture by loading weights and checking outputs, a concrete correctness test not available in tutorials that skip this step
04
Bonus chapter: Gemma 3 270M in roughly 500 lines — extends the educational approach to a 2025 production architecture, showing the differences (KV cache, grouped-query attention) from the book's GPT-2-class model
05
Runs on a laptop without a GPU — CI tests pass on macOS, Linux, and Windows; no cloud credits needed to complete the core curriculum (GPU auto-detected when available)
06
Active CI on 3 platforms using uv — the notebooks are tested automatically on each push, so you're not dealing with code that worked two years ago but silently breaks today
07
Direct sequel available — rasbt/reasoning-from-scratch (4,300 stars) picks up where this ends, covering RL and distillation for reasoning models if you want to continue after the 7 chapters
Who it’s for

If you have intermediate Python and basic ML knowledge and want to go from 'I know transformers exist' to 'I can read and write transformer code without a library', this is the most structured path available at this star count. You benefit most if you're a backend or ML engineer who uses HuggingFace daily but can't confidently explain why causal masking works or how LoRA modifies weight matrices. Not useful if you need production deployment patterns, RAG pipelines, or multi-GPU training — the Manning page explicitly excludes those from scope.

Worth exploring

Yes, if your goal is foundational LLM understanding — the 94k stars, active CI, and a published Manning book behind it make this the most validated educational resource in the category. Know this upfront: the core 7-chapter curriculum is frozen to match the printed book, keeping it anchored to GPT-2-era architecture; newer material lives in bonus chapters. Apple Silicon MPS users should expect reproducibility issues (open issue #977) and should run on CPU or CUDA instead until that is resolved.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →