Tech Products intermediate 2 min read Mar 19, 2026 · Updated Mar 20, 2026
Public Preview Sign in free for the full digest →

OpenAI just revealed exactly how Codex works — and the model is the easy part

“OpenAI tried MCP for Codex. It didn't work. Here's what they built instead.”

OpenAI just revealed exactly how Codex works — and the model is the easy part
7 Views
0 Likes
0 Bookmarks
Source · blog.bytebytego.com

“When the Codex team needed their agent to work inside VS Code, they first tried the obvious approach and exposed it through MCP. It didn't work. — ByteByteGo”

You know that feeling when you ask an AI to fix a bug and it gives you code that doesn't work because it can't run your tests? Before Codex: you'd paste code back and forth, manually run tests, and iterate. Now: Codex runs in an isolated sandbox with your full repo, executes tests, reads linter output, and keeps iterating until tests pass — all while you work on something else.

aiagentsopenaicodingllmdevtoolsarchitecture

Think of Codex as a tireless junior developer in a sandbox. You give it a task ('fix the auth bug'), and it enters an agent loop: read files, form a plan, run commands, see what happens, adjust, repeat. Each turn can involve dozens of tool calls (shell commands, file edits, test runs) before it responds. The prompt stacks context like layers: system rules, your AGENTS.md instructions, sandbox permissions, tool definitions, and conversation history. When the context window fills up, Codex 'compacts' the conversation — replacing full history with an encrypted summary that preserves the model's understanding.

01
Agent loop — why YOU care: Codex doesn't just generate code, it executes a reasoning loop. It reads files, runs shell commands, executes tests, and iterates until the task is done. You get working code, not code that might work.
02
AGENTS.md files — why YOU care: Drop a text file in your repo with project-specific instructions (test commands, coding conventions, architecture notes). Codex reads it and follows your rules. Better output without repeating yourself.
03
Multi-surface architecture — why YOU care: Same agent runs in terminal, VS Code, web, and desktop. Switch contexts without losing state. Your task continues even if you close the browser tab.
04
Prompt caching — why YOU care: Conversations grow quadratically (each turn resends all history), but caching keeps computation linear. Long sessions stay fast and affordable.
05
Isolated sandboxes — why YOU care: Each task runs in its own cloud container with your repo preloaded. Codex can't access external APIs or your local machine. Security by design.
06
Parallel tasks — why YOU care: Assign multiple tasks at once. 'Fix bug in auth, add tests for payments, refactor utils.' Codex works on them simultaneously while you focus elsewhere.
07
Bidirectional JSON-RPC — why YOU care: The server can ask for approval mid-task ('run this destructive command?'). You stay in control without babysitting every step.
Who it’s for

If you're a developer who spends time on repetitive, well-scoped tasks like refactoring, writing tests, fixing bugs, or triaging on-call issues — this is for you. OpenAI's own engineers use it to offload work that breaks focus. Not useful yet if you need image inputs for frontend work or want to course-correct mid-task. Best for teams with good test coverage and clear coding conventions.

Worth exploring

Yes — if you have a ChatGPT Pro, Plus, or Enterprise subscription, you already have access. The CLI is open source (66k stars) and free to try. The engineering blog posts reveal production-grade patterns for building agents: prompt layering, context management, protocol design. Even if you don't use Codex, the architecture is worth studying. The main gotcha: usage limits on Plus plans have been fluctuating, and cloud tasks take longer than interactive editing.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →