OpenAI just revealed exactly how Codex works — and the model is the easy part

What problem does it solve

“When the Codex team needed their agent to work inside VS Code, they first tried the obvious approach and exposed it through MCP. It didn't work. — ByteByteGo”

You know that feeling when you ask an AI to fix a bug and it gives you code that doesn't work because it can't run your tests? Before Codex: you'd paste code back and forth, manually run tests, and iterate. Now: Codex runs in an isolated sandbox with your full repo, executes tests, reads linter output, and keeps iterating until tests pass — all while you work on something else.

aiagentsopenaicodingllmdevtoolsarchitecture

How it works

Think of Codex as a tireless junior developer in a sandbox. You give it a task ('fix the auth bug'), and it enters an agent loop: read files, form a plan, run commands, see what happens, adjust, repeat. Each turn can involve dozens of tool calls (shell commands, file edits, test runs) before it responds. The prompt stacks context like layers: system rules, your AGENTS.md instructions, sandbox permissions, tool definitions, and conversation history. When the context window fills up, Codex 'compacts' the conversation — replacing full history with an encrypted summary that preserves the model's understanding.

Key takeaways

✦

01

Agent loop — why YOU care: Codex doesn't just generate code, it executes a reasoning loop. It reads files, runs shell commands, executes tests, and iterates until the task is done. You get working code, not code that might work.

⟁

02

AGENTS.md files — why YOU care: Drop a text file in your repo with project-specific instructions (test commands, coding conventions, architecture notes). Codex reads it and follows your rules. Better output without repeating yourself.

⊕

03

Multi-surface architecture — why YOU care: Same agent runs in terminal, VS Code, web, and desktop. Switch contexts without losing state. Your task continues even if you close the browser tab.

◈

04

Prompt caching — why YOU care: Conversations grow quadratically (each turn resends all history), but caching keeps computation linear. Long sessions stay fast and affordable.

∞

05

Isolated sandboxes — why YOU care: Each task runs in its own cloud container with your repo preloaded. Codex can't access external APIs or your local machine. Security by design.

◎

06

Parallel tasks — why YOU care: Assign multiple tasks at once. 'Fix bug in auth, add tests for payments, refactor utils.' Codex works on them simultaneously while you focus elsewhere.

✺

07

Bidirectional JSON-RPC — why YOU care: The server can ask for approval mid-task ('run this destructive command?'). You stay in control without babysitting every step.

Should you care?

Who it’s for

If you're a developer who spends time on repetitive, well-scoped tasks like refactoring, writing tests, fixing bugs, or triaging on-call issues — this is for you. OpenAI's own engineers use it to offload work that breaks focus. Not useful yet if you need image inputs for frontend work or want to course-correct mid-task. Best for teams with good test coverage and clear coding conventions.

Worth exploring

Yes — if you have a ChatGPT Pro, Plus, or Enterprise subscription, you already have access. The CLI is open source (66k stars) and free to try. The engineering blog posts reveal production-grade patterns for building agents: prompt layering, context management, protocol design. Even if you don't use Codex, the architecture is worth studying. The main gotcha: usage limits on Plus plans have been fluctuating, and cloud tasks take longer than interactive editing.

6 more sections · unlock free

Developer playbook

Tech stack, code snippet, sentiment, alternatives.

PM playbook

Adoption angles, user fit, positioning.

CEO playbook

Traction signals, ROI, build vs buy.

Deep-dive insight

Full long-form analysis, no fluff.

Easy mode

Core idea, fast — when you need the gist.

Pro mode

Technical nuance, edge cases, tradeoffs.

Sign in free — unlock all 6

OpenAI just revealed exactly how Codex works — and the model is the easy part

Underrated tools. Unfiltered takes.