Now in early access, book a 30-minute demo →
← Back to blog Research

OpenAI Codex Across CLI, Cloud, and IDE: Where the Security Model Differs

OpenAI Codex is not one product with one security model. It is the same agent wearing three different jackets: a local CLI, a cloud environment driven from the web and GitHub, and an IDE extension. Each surface decides three things differently: where the code actually runs, what the sandbox and approval layer enforce, and how much of the network the agent can reach. A policy that is correct for one surface can be quietly wrong for the other two.

That matters because most teams reason about Codex as a single tool. They read about the cloud environment's locked-down defaults, assume the same posture applies to the laptop, and never check. The CLI's defaults are deliberately lower-friction, and the gap between them is where untrusted code and prompt injection get traction. We saw that mismatch in the Codex branch-name command injection that leaked a GitHub token, and the broader pattern in our note on securing AI coding agents and CLIs. This piece compares the three surfaces against the official Codex docs and shows why one policy has to span all of them.

Two layers: the sandbox and the approval policy

Codex documents its protection as two distinct layers, and keeping them separate is the key to reasoning about any surface. The sandbox is the technical boundary: what the agent can read, write, and reach when it executes a model-generated command. The approval policy is the procedural boundary: when Codex must stop and ask a human before crossing the sandbox edge. As the docs put it, the sandbox defines technical boundaries and the approval policy decides when Codex must stop and ask before crossing them.

On the CLI, both layers live in config.toml. sandbox_mode takes read-only, workspace-write, or danger-full-access - the middle value is the default low-friction mode: read anywhere, edit inside the workspace, run routine local commands. approval_policy takes untrusted, on-request, never, or granular. The combination an engineer picks on their own laptop is invisible to everyone else unless something inventories it, which is the problem our runtime governance layer is built to close.

Where the code runs, surface by surface

The single most important security variable is location of execution, because it determines the blast radius of a bad command. The three Codex surfaces land in three very different places.

  • CLI executes on the developer's own machine. The same endpoint holds SSH keys, cloud credentials, browser session cookies, and source for every repo the engineer touches. The OS sandbox is the only wall between a model-generated command and that material.
  • Cloud executes in an ephemeral container. Codex creates a container, checks out the repo at the selected branch or commit SHA, runs a setup script, then runs the agent loop. The container is disposable and the result surfaces as a diff, so a destructive command is contained to throwaway infrastructure rather than a laptop full of secrets.
  • IDE runs the agent inside the editor but executes commands against the same local workspace and the same OS-level sandbox the CLI uses. The convenience is a tighter loop with the open project; the security posture inherits the laptop's, not the cloud's.

Cloud is the most contained surface by design; the two local surfaces share the riskiest execution context. Teams that pilot Codex in the cloud and then roll it out to laptops are not extending the same security model - they are switching to a weaker one, often without noticing. That is the surface-confusion behind the multi-agent prompt-injection credential theft we analyzed, where the controls people assumed were active did not apply to the surface under attack.

What the sandbox actually enforces on each OS

The CLI and IDE depend on OS-native isolation, and the mechanism changes with the platform - which means the strength of the wall changes too. On macOS, Codex uses the built-in Seatbelt framework via sandbox-exec, enforced automatically with no prerequisites. On Linux and WSL2 it uses bubblewrap for user-namespace isolation with seccomp filtering; the docs note Codex uses the first bwrap executable on PATH, and that AppArmor on some distributions can interfere unless the bwrap-userns-restrict profile is loaded. If bubblewrap is missing, Codex falls back to a bundled helper that needs unprivileged user-namespace support.

Two operational facts follow from this for anyone running Codex locally at scale. First, the sandbox is only as strong as the host's configuration: a distro that blocks unprivileged namespaces or lacks bubblewrap changes the effective boundary. Second, sandbox_mode = "danger-full-access" removes the filesystem and network boundary entirely - a legitimate setting that turns the sandbox off. You cannot know which endpoints sit in that state by reading the docs; you have to look at the fleet inventory of what is actually configured where.

Network exposure: the sharpest difference

Network reach is where the three surfaces diverge most, and it is the lever prompt injection pulls to exfiltrate data. The cloud environment runs a two-phase model: the setup phase has internet access to install dependencies, then the agent phase runs with internet access off by default. The docs tie this directly to prompt injection - they describe a scenario where an attacker plants instructions in a GitHub issue and the agent, if it had network access, exfiltrates data such as commit history to an external server. When you do enable agent-phase access, you choose a preset (off, restricted, or unrestricted), and a restricted allowlist can be narrowed to specific domains and limited to GET, HEAD, and OPTIONS so the agent cannot POST data out.

The CLI inverts the framing. In workspace-write, network access is off by default and is enabled by a single key: sandbox_workspace_write.network_access = true. That is one line in a file an individual engineer edits, with no allowlist and no method restriction - all-or-nothing per the sandbox config. The IDE inherits the same local behavior. So the most data-exposed configuration is the easiest to reach: a developer flips one boolean to make a build work, and the laptop surface has unrestricted egress while the cloud surface still enforces domain and method limits. Mapping who has flipped it is the kind of state that belongs in an audit trail.

The three surfaces side by side

DimensionCLICloud (web / GitHub)IDE extension
Where code runsDeveloper's machineEphemeral container, repo checked out at branch/SHADeveloper's machine (editor workspace)
Sandbox mechanismSeatbelt on macOS; bubblewrap + seccomp on Linux/WSL2Container isolation; result returned as a diffSame OS sandbox as CLI
Default write scopeworkspace-write (edit inside workspace)Container filesystem, ephemeralworkspace-write
Agent-phase network defaultOff until sandbox_workspace_write.network_access = trueOff by default; presets off / restricted / unrestrictedOff (inherits CLI sandbox)
Network granularityAll-or-nothing per sandbox configDomain allowlist + HTTP method limits (GET/HEAD/OPTIONS)All-or-nothing per sandbox config
Who sets the policyIndividual engineer (config.toml)Per environment, often org-managedIndividual engineer

Why one policy has to span all three

Read the table top to bottom and the failure mode is obvious. The cloud surface can be governed centrally - per-environment network presets, domain allowlists, method restrictions - while the two local surfaces are configured per engineer in a file no one else reads. An organization can lock down Codex cloud beautifully and still have dozens of laptops running workspace-write with network_access = true and approval_policy = "never". The agent that the security team believes is sandboxed and offline is, on those endpoints, neither.

A defensible Codex policy is not three policies. It is one intended posture - say, no full-access sandbox, no unrestricted egress, approvals required for commands that cross the sandbox edge - applied and verified across every surface where Codex executes. The cloud half of that is enforceable through the platform. The local half depends on what is actually present in each engineer's config, and that is the half that drifts.

A reasonable working rule across the three surfaces:

  1. Treat the CLI and IDE as the same risk tier - both run on the credential-rich endpoint, so they need the same sandbox and approval baseline.
  2. Keep agent-phase network off everywhere by default, and where it must be on, prefer the cloud surface's domain allowlist and method limits over the CLI's all-or-nothing toggle.
  3. Forbid danger-full-access and approval_policy = never as standing configuration, and detect them when they appear rather than trusting they never will.
  4. Record what each endpoint is actually running so the intended policy and the observed policy can be compared, not assumed.

That last point is where the local surfaces defeat documentation alone. The docs tell you what config.toml can do; they cannot tell you what any given laptop has set. Anomity inventories Codex and the other AI artifacts on every managed endpoint - agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs - and reads their effective configuration, so the sandbox mode, approval policy, and network setting on each surface become visible instead of inferred. On agents that expose a hook, such as a PreToolUse-style check, Anomity returns allow, deny, or log on each tool call before it runs, and keeps a queryable 90-day record routed to your SIEM, Slack, email, or Jira. It does not replace the OS sandbox or the platform controls; it makes the local surfaces governable the way the cloud surface already is. If you want to see what your fleet's Codex configuration actually looks like across CLI, cloud, and IDE, that is where early access starts.

Ask AI about Anomity
ChatGPT Claude Perplexity Google AI Grok