← Back to blog Research

OpenAI Codex’s Sandbox and Approval Model: What’s Isolated, What Isn’t

Anomity Research Security Researcher, Anomity · Jun 7, 2026 · 7 min read

OpenAI Codex ships with two distinct controls that engineers routinely conflate: a sandbox that decides what the agent can technically do to the filesystem and network, and an approval policy that decides when Codex has to stop and ask. They are separate dials, enforced at different layers, and confusing them is the fastest way to end up running an agent with more access than you intended.

Codex’s defaults are conservative and the enforcement is real, not advisory. The harder news is that a sandbox is a per-process boundary, not a fleet control - it tells you nothing about which mode any given developer is running, or what the agent did inside the boundary it was granted. This is the same gap we walked through for the broader category in securing AI coding agents and CLIs. Here we look at what Codex isolates, how, and where responsibility quietly shifts back to you.

The two dials: sandbox mode and approval policy

Per OpenAI’s documentation, Codex exposes three sandbox modes and three approval policies, and they are orthogonal. The sandbox enforces what is technically possible; the approval policy decides when Codex must ask first. You can pair almost any sandbox with almost any approval setting, so the combination matters more than either value alone.

The sandbox modes are read-only (Codex can inspect files but cannot edit them or run commands without approval), workspace-write (Codex can read files, edit within the workspace, and run routine local commands inside that boundary), and danger-full-access (no filesystem or network boundary at all). The approval policies are untrusted (only known-safe read operations run automatically; anything that can mutate state or trigger external execution requires approval), on-request (Codex works inside the sandbox and asks when it needs to go beyond that boundary), and never (no approval interruptions). The danger-full-access mode paired with never is what the docs surface under the --yolo alias and explicitly mark *not recommended*.

Combination	What Codex can do without asking	Where it stops
read-only + on-request	Read files, answer questions	Any edit or command prompts for approval
workspace-write + on-request	Read, edit in workspace, run local commands	Edits outside workspace and network prompt
workspace-write + never	Everything inside the sandbox, silently	Hard sandbox boundary only; no prompt
danger-full-access + never	Anything, including network and host writes	Nothing is enforced

The row that deserves attention is workspace-write + never. The agent runs unattended but is still fenced by the OS-level sandbox - a reasonable posture for CI. The risk is that nothing in the developer’s shell announces which row is live. That visibility gap is the throughline of this post, and the reason we treat the fleet inventory as the starting point rather than the sandbox itself.

How the sandbox is enforced (it’s OS-native)

Codex does not roll its own isolation. According to the sandboxing docs, it builds on the platform’s own kernel-level primitives, which is what makes the boundary meaningful rather than cosmetic. On macOS it uses the built-in Seatbelt framework. On Linux and WSL2 it relies on the bubblewrap package and unprivileged user namespaces. On Windows it uses the native Windows sandbox in PowerShell, with the Linux implementation available through WSL2.

In workspace-write, the default posture pairs filesystem writes scoped to the workspace with network access off by default. You can extend the writable area through sandbox_workspace_write.writable_roots without dropping to full access - widen exactly the directories a build needs rather than removing the boundary. When an operation exceeds the sandbox - editing outside the workspace, reaching the network, or running a command flagged untrusted - Codex escalates to the approval flow rather than failing silently. That escalation-versus-silent-execution distinction is what made the injection chain in the Codex branch-name command injection writeup so damaging: the dangerous step never surfaced to a human.

The two-phase cloud runtime: setup has network, the agent doesn’t

Codex’s cloud environments add a second layer of isolation that is easy to miss because it is temporal rather than spatial. Execution splits into two phases. The setup phase runs your configuration scripts with internet access enabled so dependencies can install. The agent phase then runs in the prepared container with internet access off by default, though you can configure limited or unrestricted access per environment.

The detail worth internalizing is how secrets are handled across that seam. The docs distinguish two things: environment variables, which persist through both phases, and secrets, which are encrypted at rest, decrypted only for task execution, available only to setup scripts, and - in OpenAI’s phrasing - removed before the agent phase starts. A credential needed to npm install a private package is present while dependencies resolve, then gone before the model-driven phase begins, which shrinks the window in which a prompt-injected agent can read a live secret. It is the structural opposite of the failure mode in the multi-agent comment-and-control credential theft analysis, where injected instructions reached credentials sitting in the agent’s reach.

Where it breaks: the gaps a fleet still owns

None of the above is a knock on Codex. The sandbox is real, the defaults are sane, and the two-phase model is good engineering. But every control here is scoped to a single process on a single endpoint, and several gaps survive that scope:

Configuration is per-invocation and invisible to the org. workspace-write + never and danger-full-access + never look identical from the outside - a quiet terminal. Nothing centrally records which mode ran, so the tool alone cannot tell you how many endpoints ran Codex with the sandbox off this week.
The sandbox protects the host, not the workspace. In workspace-write the agent has full read and edit rights inside the repo by design - source, local .env files, and checked-in config are all in scope. A prompt-injected agent staying inside the boundary is still doing real damage; the boundary was never meant to stop it.
Secret removal is a cloud-runtime property, not a local one. The strip-before-agent-phase guarantee applies to Codex cloud environments. A developer running Codex locally against a shell full of exported API keys gets none of that isolation.
Network escalation is a decision, not a log. Approving network access answers the prompt in front of one engineer; it leaves no durable, queryable record a reviewer or auditor can pull weeks later.

The pattern is consistent with every coding agent we examine: the vendor controls the runtime, but the *fleet posture* - who is running what, in which mode, against which data - is unowned by default. A single misconfigured ~/.codex/config.toml stays invisible until something goes wrong.

What to check across your fleet

Translate the model above into a small set of questions you should be able to answer for every endpoint running Codex, not just your own:

Which sandbox mode and approval policy is configured - and is danger-full-access or never set anywhere it shouldn’t be?
On endpoints where the agent phase has network enabled, is that a deliberate, reviewed exception or config drift?
Are local secrets reachable from the workspace Codex is writing in, given local runs don’t get the cloud secret-strip behavior?
When a developer approved a network or out-of-workspace escalation, is there a record of it that survives the session?

The first two are inventory and classification problems; the last two are runtime and audit problems. Codex gives you the enforcement primitives - strong ones - but answering these at fleet scale needs something watching the endpoint, not the model, as we covered for the broader CLI category in securing AI coding agents and CLIs.

Making the sandbox layer visible and governable

This is the layer Anomity is built to surface. It inventories Codex alongside the other AI artifacts on each managed endpoint and classifies them, so the sandbox and approval configuration becomes a fact you can query rather than a per-developer choice you hope is set correctly (fleet inventory). On agents that expose a hook, it can return allow, deny, or log on each tool call before it runs, turning an in-session approval prompt into an enforced, org-wide policy (allow/deny/log at the hook). Every decision lands in a queryable 90-day audit trail and routes to your SIEM, Slack, or Jira, so an approved network escalation is no longer a private moment in one terminal (audit trail). Codex isolates the process well; Anomity makes that isolation - and its exceptions - visible across the fleet. If that gap is yours to close, request early access.