OpenAI just revealed exactly how Codex works — and the model is the easy part
Snaplyze Digest
Tech Products intermediate 2 min read Mar 19, 2026 Updated Mar 20, 2026

OpenAI just revealed exactly how Codex works — and the model is the easy part

“OpenAI tried MCP for Codex. It didn't work. Here's what they built instead.”

In Short

The codex-1 model is just one component. The real engineering went into the agent loop, prompt management, and a custom protocol that MCP couldn't handle. OpenAI rejected MCP because it couldn't support streaming progress, mid-task approvals, or code diffs — so they built their own JSON-RPC protocol. The prompt grows quadratically with conversation length, but prompt caching keeps computation linear. 66k GitHub stars in under a year.

aiagentsopenaicodingllm
Why It Matters
The practical pain point this digest is really about.

You know that feeling when you ask an AI to fix a bug and it gives you code that doesn't work because it can't run your tests? Before Codex: you'd paste code back and forth, manually run tests, and iterate. Now: Codex runs in an isolated sandbox with your full repo, executes tests, reads linter output, and keeps iterating until tests pass — all while you work on something else.

How It Works
The mechanism, architecture, or workflow behind it.

Think of Codex as a tireless junior developer in a sandbox. You give it a task ('fix the auth bug'), and it enters an agent loop: read files, form a plan, run commands, see what happens, adjust, repeat. Each turn can involve dozens of tool calls (shell commands, file edits, test runs) before it responds. The prompt stacks context like layers: system rules, your AGENTS.md instructions, sandbox permissions, tool definitions, and conversation history. When the context window fills up, Codex 'compacts' the conversation — replacing full history with an encrypted summary that preserves the model's understanding.

Key Takeaways
7 fast bullets that make the core value obvious.
  • Agent loop — why YOU care: Codex doesn't just generate code, it executes a reasoning loop. It reads files, runs shell commands, executes tests, and iterates until the task is done. You get working code, not code that mi...
  • AGENTS.md files — why YOU care: Drop a text file in your repo with project-specific instructions (test commands, coding conventions, architecture notes). Codex reads it and follows your rules. Better output without repe...
  • Multi-surface architecture — why YOU care: Same agent runs in terminal, VS Code, web, and desktop. Switch contexts without losing state. Your task continues even if you close the browser tab.
  • Prompt caching — why YOU care: Conversations grow quadratically (each turn resends all history), but caching keeps computation linear. Long sessions stay fast and affordable.
  • Isolated sandboxes — why YOU care: Each task runs in its own cloud container with your repo preloaded. Codex can't access external APIs or your local machine. Security by design.
  • Parallel tasks — why YOU care: Assign multiple tasks at once. 'Fix bug in auth, add tests for payments, refactor utils.' Codex works on them simultaneously while you focus elsewhere.
  • Bidirectional JSON-RPC — why YOU care: The server can ask for approval mid-task ('run this destructive command?'). You stay in control without babysitting every step.
Should You Care?
Audience fit, decision signal, and the original source in one place.

Who It Is For

If you're a developer who spends time on repetitive, well-scoped tasks like refactoring, writing tests, fixing bugs, or triaging on-call issues — this is for you. OpenAI's own engineers use it to offload work that breaks focus. Not useful yet if you need image inputs for frontend work or want to course-correct mid-task. Best for teams with good test coverage and clear coding conventions.

Worth Exploring?

Yes — if you have a ChatGPT Pro, Plus, or Enterprise subscription, you already have access. The CLI is open source (66k stars) and free to try. The engineering blog posts reveal production-grade patterns for building agents: prompt layering, context management, protocol design. Even if you don't use Codex, the architecture is worth studying. The main gotcha: usage limits on Plus plans have been fluctuating, and cloud tasks take longer than interactive editing.

View original source
What the full digest unlocks

There is more here than the public preview.

This page gives you the hook. The full Snaplyze digest goes deeper so you can move from curiosity to decision with less noise.

Open the full digest to read the deeper breakdown, compare viewpoints, and get the practical next-step playbooks.

Open the full digest

Snaplyze

Go beyond the preview

Read the full digest for deep-dive insight, Easy Mode, Pro Mode, and practical playbooks you can actually use.

Install Snaplyze