“The key invariant is not 'all pages are consistent with each other.' It is 'all pages are at least as old as the redo point, and the WAL from the redo point forward is complete.' — Gaurav Sarma, explaining PostgreSQL's fuzzy checkpointing”
You know that feeling when your database latency histogram shows a cliff every few minutes? That's checkpointing. Your database has gigabytes of dirty pages in memory that need to hit disk for crash recovery. The naive approach—pause all writes, flush everything, resume—gives you a consistent snapshot but also gives you multi-hundred-millisecond stalls. For a system doing 50,000 writes per second, that stall shows up as a cliff in your p99 latency every time the checkpoint fires. Every major database has had to solve this, and the solutions are more varied than you'd expect.
Think of it like taking a photo of a moving crowd. You can't freeze everyone in place, so you use tricks: take multiple photos and stitch them together (PostgreSQL's WAL replay), have people step into a side room for their photo (SQLite's WAL file), or create a duplicate crowd that stands still while the original keeps moving (Redis's fork). PostgreSQL records a 'redo point' in its write-ahead log, flushes dirty pages in the background while writes continue, and on recovery replays the log forward from that point. SQLite writes all changes to a separate WAL file; the checkpoint merges pages back to the main file only when no readers need them. Redis calls fork() to get a child process with a copy-on-write view of memory, then the child serializes everything to disk while the parent keeps serving requests.
If you're a backend engineer who's ever stared at a latency spike and wondered 'is this checkpointing?', or a DBA tuning PostgreSQL's checkpoint_completion_target without understanding why it matters, this is for you. Especially valuable if you're choosing between databases and want to understand their persistence trade-offs. Not useful if you don't work with databases that have crash recovery requirements, or if you only use managed database services where you can't tune checkpoint behavior.
Yes, this is one of the clearest explanations of database checkpointing you'll find. The author works on Stripe's database engine team and has 10 years of infrastructure experience, and it shows—the article includes actual code snippets, disk layouts, and a comparison table of trade-offs. Published March 15, 2026, so it's current. The only caveat: it assumes you know what WAL and buffer pools are, so complete beginners should read the PostgreSQL docs first.
Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.