R&D intermediate 3 min read Mar 24, 2026 · Updated Apr 2, 2026
Public Preview Sign in free for the full digest →

How Databases Checkpoint to Disk Without Stopping the World

“Your database has gigabytes of dirty pages. They need to hit disk. Here's how 5 databases do it without pausing your app.”

How Databases Checkpoint to Disk Without Stopping the World
5 Views
0 Likes
0 Bookmarks
Source · gauravsarma.com

“The key invariant is not 'all pages are consistent with each other.' It is 'all pages are at least as old as the redo point, and the WAL from the redo point forward is complete.' — Gaurav Sarma, explaining PostgreSQL's fuzzy checkpointing”

You know that feeling when your database latency histogram shows a cliff every few minutes? That's checkpointing. Your database has gigabytes of dirty pages in memory that need to hit disk for crash recovery. The naive approach—pause all writes, flush everything, resume—gives you a consistent snapshot but also gives you multi-hundred-millisecond stalls. For a system doing 50,000 writes per second, that stall shows up as a cliff in your p99 latency every time the checkpoint fires. Every major database has had to solve this, and the solutions are more varied than you'd expect.

databasessystems-designpostgresqlredissqliterocksdbmongodb

Think of it like taking a photo of a moving crowd. You can't freeze everyone in place, so you use tricks: take multiple photos and stitch them together (PostgreSQL's WAL replay), have people step into a side room for their photo (SQLite's WAL file), or create a duplicate crowd that stands still while the original keeps moving (Redis's fork). PostgreSQL records a 'redo point' in its write-ahead log, flushes dirty pages in the background while writes continue, and on recovery replays the log forward from that point. SQLite writes all changes to a separate WAL file; the checkpoint merges pages back to the main file only when no readers need them. Redis calls fork() to get a child process with a copy-on-write view of memory, then the child serializes everything to disk while the parent keeps serving requests.

01
Three fundamental primitives — every non-blocking checkpoint is either WAL replay (PostgreSQL, WiredTiger), side-channel merge (SQLite, RocksDB), or fork-based snapshot (Redis), and understanding which primitive your database uses tells yo...
02
PostgreSQL fuzzy checkpointing — records a redo point, flushes dirty pages in background via bgwriter/checkpointer processes, spreads I/O over time with checkpoint_completion_target (default 0.9), and on recovery replays WAL forward from r...
03
SQLite WAL mode — writes only to WAL file during transactions, reads check WAL first then main file, checkpoint copies pages back only up to minimum read mark, and PASSIVE mode never blocks but WAL can grow unboundedly with long-running re...
04
Redis BGSAVE fork strategy — fork() creates child with copy-on-write view of parent memory, child walks data structures and writes RDB sequentially, parent continues serving writes, and worst case memory doubles if every page modified duri...
05
RocksDB immutable memtable flush — active MemTable rotates to immutable when full, background thread flushes immutable to L0 SSTable, writes accumulate in new active MemTable during flush, and GetLiveFiles() creates instant checkpoint via ...
06
WiredTiger hazard pointers — maintains two checkpoints (durable and in-progress), uses hazard pointers for lock-free reader synchronization, writes to new disk locations (append-only), and atomically updates metadata file on checkpoint com...
Who it’s for

If you're a backend engineer who's ever stared at a latency spike and wondered 'is this checkpointing?', or a DBA tuning PostgreSQL's checkpoint_completion_target without understanding why it matters, this is for you. Especially valuable if you're choosing between databases and want to understand their persistence trade-offs. Not useful if you don't work with databases that have crash recovery requirements, or if you only use managed database services where you can't tune checkpoint behavior.

Worth exploring

Yes, this is one of the clearest explanations of database checkpointing you'll find. The author works on Stripe's database engine team and has 10 years of infrastructure experience, and it shows—the article includes actual code snippets, disk layouts, and a comparison table of trade-offs. Published March 15, 2026, so it's current. The only caveat: it assumes you know what WAL and buffer pools are, so complete beginners should read the PostgreSQL docs first.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →