How Databases Checkpoint to Disk Without Stopping the World
Snaplyze Digest
R&D intermediate 3 min read Mar 24, 2026 Updated Apr 2, 2026

How Databases Checkpoint to Disk Without Stopping the World

“Your database has gigabytes of dirty pages. They need to hit disk. Here's how 5 databases do it without pausing your app.”

In Short

A Stripe database engineer breaks down how PostgreSQL, SQLite, Redis, RocksDB, and MongoDB/WiredTiger flush dirty pages to disk without pausing writes. The article reveals that all non-blocking checkpoints reduce to three primitives: WAL replay (flush async, replay log forward), side-channel merge (write to separate file, merge back), and fork-based snapshots (copy-on-write via fork()). Each approach trades write stall for something else—recovery time, memory overhead, or I/O bandwidth—and the author includes code snippets and disk layouts to show exactly how each system implements its chosen...

databasessystems-designpostgresqlredissqlite
Why It Matters
The practical pain point this digest is really about.

You know that feeling when your database latency histogram shows a cliff every few minutes? That's checkpointing. Your database has gigabytes of dirty pages in memory that need to hit disk for crash recovery. The naive approach—pause all writes, flush everything, resume—gives you a consistent snapshot but also gives you multi-hundred-millisecond stalls. For a system doing 50,000 writes per second, that stall shows up as a cliff in your p99 latency every time the checkpoint fires. Every major database has had to solve this, and the solutions are more varied than you'd expect.

How It Works
The mechanism, architecture, or workflow behind it.

Think of it like taking a photo of a moving crowd. You can't freeze everyone in place, so you use tricks: take multiple photos and stitch them together (PostgreSQL's WAL replay), have people step into a side room for their photo (SQLite's WAL file), or create a duplicate crowd that stands still while the original keeps moving (Redis's fork). PostgreSQL records a 'redo point' in its write-ahead log, flushes dirty pages in the background while writes continue, and on recovery replays the log forward from that point. SQLite writes all changes to a separate WAL file; the checkpoint merges pages back to the main file only when no readers need them. Redis calls fork() to get a child process with a copy-on-write view of memory, then the child serializes everything to disk while the parent keeps serving requests.

Key Takeaways
6 fast bullets that make the core value obvious.
  • Three fundamental primitives — every non-blocking checkpoint is either WAL replay (PostgreSQL, WiredTiger), side-channel merge (SQLite, RocksDB), or fork-based snapshot (Redis), and understanding which primitive your da...
  • PostgreSQL fuzzy checkpointing — records a redo point, flushes dirty pages in background via bgwriter/checkpointer processes, spreads I/O over time with checkpoint_completion_target (default 0.9), and on recovery replay...
  • SQLite WAL mode — writes only to WAL file during transactions, reads check WAL first then main file, checkpoint copies pages back only up to minimum read mark, and PASSIVE mode never blocks but WAL can grow unboundedly ...
  • Redis BGSAVE fork strategy — fork() creates child with copy-on-write view of parent memory, child walks data structures and writes RDB sequentially, parent continues serving writes, and worst case memory doubles if ever...
  • RocksDB immutable memtable flush — active MemTable rotates to immutable when full, background thread flushes immutable to L0 SSTable, writes accumulate in new active MemTable during flush, and GetLiveFiles() creates ins...
  • WiredTiger hazard pointers — maintains two checkpoints (durable and in-progress), uses hazard pointers for lock-free reader synchronization, writes to new disk locations (append-only), and atomically updates metadata fi...
Should You Care?
Audience fit, decision signal, and the original source in one place.

Who It Is For

If you're a backend engineer who's ever stared at a latency spike and wondered 'is this checkpointing?', or a DBA tuning PostgreSQL's checkpoint_completion_target without understanding why it matters, this is for you. Especially valuable if you're choosing between databases and want to understand their persistence trade-offs. Not useful if you don't work with databases that have crash recovery re...

Worth Exploring?

Yes, this is one of the clearest explanations of database checkpointing you'll find. The author works on Stripe's database engine team and has 10 years of infrastructure experience, and it shows—the article includes actual code snippets, disk layouts, and a comparison table of trade-offs. Published March 15, 2026, so it's current. The only caveat: it assumes you know what WAL and buffer pools are, so complete beginners should read the PostgreSQL docs first.

View original source
What the full digest unlocks

There is more here than the public preview.

This page gives you the hook. The full Snaplyze digest goes deeper so you can move from curiosity to decision with less noise.

Open the full digest to read the deeper breakdown, compare viewpoints, and get the practical next-step playbooks.

Open the full digest

Snaplyze

Go beyond the preview

Read the full digest for deep-dive insight, Easy Mode, Pro Mode, and practical playbooks you can actually use.

Install Snaplyze