Now in early access, book a 30-minute demo →
← Back to blog Guide

Securing OpenAI Codex: Sandbox Modes, Approval Policies, and the Two-Phase Runtime (2026)

TL;DR
  • Securing OpenAI Codex is two independent dials, not one: sandbox_mode sets what the agent can technically do, and approval_policy sets when it must stop and ask you.
  • There are three sandbox modes - read-only, workspace-write (the default), and danger-full-access - and three approval policies - untrusted, on-request, and never.
  • In workspace-write, network access is off by default; you turn it on only with [sandbox_workspace_write] network_access = true, and .git, .codex, and .agents stay read-only even then.
  • Codex cloud runs in two phases: setup scripts get internet and your secrets; the agent phase gets neither by default, and secrets are removed before the agent phase starts.
  • Enabling agent internet access carries four documented risks - prompt injection, exfiltration of code or secrets, malware, and license contamination - so allowlist only the domains and HTTP methods you need.
  • Anomity inventories Codex across the fleet, governs allow/deny/log at the hook, and writes every decision to a queryable 90-day audit trail.

Securing OpenAI Codex starts with one decision that is easy to get wrong: it is not a single safety switch but two independent dials, and confusing them leaves a gap. The sandbox decides what Codex can technically do when it runs a model-generated command - where it can write, whether it can reach the network. The approval policy decides when Codex must stop and ask you first. Set a tight sandbox but assume it also handles approvals, or pick a strict approval prompt but leave the sandbox wide, and the combination behaves differently than you expect. This guide walks both dials, the cloud runtime's two phases and why your secrets vanish before the agent runs, and what to enforce at the endpoint - closing with a hardening checklist. For the broader pattern across coding agents, start with securing AI coding agents and CLIs.

Every behavior described here comes from OpenAI Codex's official sandboxing, agent-approvals, and cloud-environment docs, with one anchoring principle: the sandbox and approval layers enforce limits regardless of what the prompt asks the model to do. Instructions in a prompt shape intent; these controls decide capability. That gap between configured intent and actual enforcement is exactly where things go wrong, as the Codex branch-name command injection and GitHub token theft case showed when untrusted input reached a command path. Anomity exists to make these per-machine controls verifiable across a fleet - but configure them correctly first; we cover the fleet inventory and runtime governance at the hook at the end.

Throughout, the config keys are the real ones from the Codex configuration reference: sandbox_mode, approval_policy, and the [sandbox_workspace_write] table all live in config.toml. The same settings are exposed as CLI flags (--sandbox, --ask-for-approval) for one-off runs. Where the two diverge - local CLI versus Codex cloud - the docs treat them as related but distinct, and so does this guide.

How do the three sandbox modes differ?

Codex offers three sandbox modes set via sandbox_mode. read-only lets Codex inspect files but it cannot edit files or run commands without approval. workspace-write is the default: Codex can read files, edit within the workspace, and run routine local commands inside that boundary, while still asking before it reaches the network or goes beyond the workspace. danger-full-access removes restrictions entirely - it strips both the filesystem and the network boundaries. The name is a warning, not branding: full access is the trust-boundary collapse examined in the OpenAI Codex full-access trust boundary.

These are OS-level boundaries, not honor-system flags. On macOS, Codex uses the built-in Seatbelt framework; on Linux and WSL2 it uses bubblewrap user-namespace isolation with a bundled fallback; on Windows PowerShell it uses the native Windows sandbox. Within workspace-write you can extend write reach to additional directories through sandbox_workspace_write.writable_roots without dropping to full access. And even in a writable sandbox, the documented protected paths stay read-only: .git and resolved Git directory pointers, .codex, and .agents.

sandbox_modeFilesystemNetworkUse it for
read-onlyInspect only; no edits, no commands without approvalBlockedUntrusted or unfamiliar repos; read-only CI
workspace-write (default)Edit inside workspace; .git/.codex/.agents stay read-onlyOff unless network_access = trueDay-to-day coding inside a trusted project
danger-full-accessNo filesystem boundaryNo network boundaryDisposable, isolated containers only

What does each approval policy actually do?

The approval policy is the second dial, set via approval_policy, and it governs when Codex pauses for you. untrusted runs known-safe read operations automatically but requires approval before commands that mutate state or trigger external execution, such as destructive Git operations. on-request - the standard interactive mode - has the agent ask when it needs to escalate beyond the sandbox boundary: editing files outside the workspace, accessing the network, or running certain commands. never disables approval prompts entirely, leaving the sandbox as the only control, which suits non-interactive CI. The model that pairs these two dials is the same one we contrast across tools in Claude Code vs Codex vs Cursor permission models.

Two newer options are worth knowing. A granular approval_policy lets you keep specific categories interactive - sandbox escalations, rule prompts, MCP elicitations - while auto-rejecting others, like skill scripts. And approvals_reviewer = "auto_review" routes eligible approval requests to an automatic reviewer agent that evaluates for data exfiltration, credential probing, and destructive operations, denying critical-risk actions and gating high-risk ones. Useful, but a reviewer is still software making a judgment, not a guarantee - treat it as defense in depth, not a replacement for a tight sandbox.

Which sandbox and approval combination fits each task?

Because the two dials are independent, the secure setup is choosing a pair that matches intent rather than reaching for one extreme. The docs give a clean set of recommended combinations, and the right way to read the table below is by the job in front of you - exploration, day-to-day work, or unattended automation - not by a single global default you apply everywhere. This is the same decision-by-context discipline we apply to Cursor's auto-run in Cursor's auto-run, YOLO, and allow/deny limits.

IntentSandbox + approvalBehavior
Explore an untrusted reporead-only + on-requestInspect freely; asks before any modification
Standard interactive codingworkspace-write + on-requestEdits workspace freely; asks before leaving sandbox or using network
Read-only CIread-only + neverReads files only; no prompts, no write, no network
Disposable container, hands-offdanger-full-access + neverNo boundaries - only safe inside isolation you control

The combination to scrutinize is anything that pairs broad capability with no prompts. danger-full-access plus never is exactly the configuration to avoid on a developer laptop or any machine with reachable credentials - it is the local equivalent of running with no seatbelt. If you find that pair in a CI script or a shared profile, treat it as a finding, not a convenience.

How is network access controlled in workspace-write?

Network access is the lever most likely to be set carelessly. In the workspace-write sandbox, commands have no network access by default; you enable it explicitly. The toggle lives in the [sandbox_workspace_write] table:

approval_policy   = "on-request"
sandbox_mode      = "workspace-write"

[sandbox_workspace_write]
network_access    = false   # default; commands cannot reach the network
writable_roots    = ["/path/to/extra/dir"]   # optional extra write scope

Leaving network_access = false is the conservative posture, and it is the value a documented managed baseline uses. Turning it on hands every model-generated command an egress path - which is precisely the channel an injected instruction would use to exfiltrate. The reason network and approvals are separate dials matters here: a command can be inside the sandbox boundary and still reach out if you flipped this one flag, which is why network exposure deserves its own review rather than being folded into a general approval setting. The cross-process exfiltration risk that opens up is the core of the comment-and-control multi-agent prompt injection and credential theft analysis.

What happens in the setup phase versus the agent phase?

Codex cloud splits a task into two phases, and the split is a security feature, not just an implementation detail. In the setup phase, Codex runs your setup script with internet access enabled so it can install dependencies and tools. In the agent phase, the agent executes terminal commands in a loop - editing code, running checks - with internet access off by default. Recognizing which phase you are in tells you what the environment can reach and what credentials are present, which is the foundation of securing AI coding agents and CLIs end to end.

Secrets handling is the sharpest distinction. Environment variables are available for the full task, across both phases, with standard storage. Secrets are stored with an additional layer of encryption, are decrypted only for task execution, and are available only to setup scripts - they are removed before the agent phase starts. So a build credential a setup script needs should be a secret, not an environment variable, precisely so it is gone by the time model-generated commands run. One practical trap: setup scripts run in a separate Bash session from the agent, so an export does not persist into the agent phase; durable configuration has to go into ~/.bashrc or the environment settings.

PropertyEnvironment variablesSecrets
Available in setup phaseYesYes
Available in agent phaseYesNo - removed before it starts
StorageStandardAdditional layer of encryption; decrypted only for execution
Use forNon-sensitive configurationCredentials a setup script needs but the agent must not see

How do you harden agent internet access if you must enable it?

Sometimes the agent phase genuinely needs network - to fetch a package mid-task, for example. Codex cloud lets you enable agent internet access per environment, but the docs are blunt that doing so raises risk, listing four concerns: prompt injection from untrusted web content, exfiltration of code or secrets, downloading malware or vulnerable dependencies, and pulling in content with license restrictions. The way to enable it safely is to narrow it. The trust-boundary thinking here mirrors the OpenAI Codex CLI, cloud, and IDE security differences.

  • Choose the domain allowlist scope deliberately: none (custom domains only), a curated common-dependencies preset for typical build hosts, or all (unrestricted) - and reach for unrestricted only when nothing narrower works.
  • Restrict allowed HTTP methods to GET, HEAD, and OPTIONS, which blocks POST, PUT, PATCH, and DELETE - read paths stay open while the obvious write-based exfiltration verbs are closed.
  • Open only the specific domains and HTTP methods a task needs, then review the agent output and work log afterward, as the docs recommend.
  • Prefer moving network-dependent steps into the setup phase, where internet is available anyway and secrets still exist, so the agent phase can stay offline.

What is the OpenAI Codex hardening checklist?

Run this wherever Codex operates - local CLI and cloud environments alike. Each item maps to a control above and reflects documented behavior, not guesswork.

  1. Set sandbox_mode per context: read-only for untrusted repos and read-only CI, workspace-write for trusted day-to-day coding.
  2. Pair it with an approval_policy that matches intent - on-request for interactive work, never only when the sandbox alone is sufficient (read-only CI).
  3. Keep [sandbox_workspace_write] network_access = false unless a task provably needs egress; treat any true as a reviewable decision.
  4. Never ship danger-full-access to a developer laptop or any host with reachable credentials; confine it to disposable, isolated containers.
  5. Store build credentials as secrets, not environment variables, so they are removed before the agent phase; verify the agent phase has no sensitive variables.
  6. Confirm the protected paths (.git, .codex, .agents) are intact and not worked around by a setup script.
  7. If agent internet access is on, restrict to a domain allowlist and to GET/HEAD/OPTIONS, and review the work log after each run.
  8. Apply a managed configuration with conservative defaults (on-request, workspace-write, network off) and constrain who can change the security-sensitive settings.
  9. Keep repos version-controlled and prefer patch-based changes so anything the agent does is easy to review and revert.
  10. Verify the effective sandbox, approval, and network configuration across every endpoint - not one machine at a time.

How Anomity governs OpenAI Codex

Every control above is real, well-documented, and per-machine. sandbox_mode, approval_policy, and the network_access flag live in a config.toml on each developer's machine or in a cloud environment's settings; the managed baseline only governs endpoints that actually received it. There is no built-in answer to the fleet-level questions: which machines run Codex in danger-full-access, which set network_access = true, which never received the managed configuration, and which CI profiles pair full access with never. That blind spot is what turns one misconfigured endpoint into an incident.

Anomity is the layer that makes it verifiable. The endpoint daemon inventories Codex alongside the other AI artifacts on each managed machine - agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs - classifies them, and surfaces the sandbox, approval, and network settings actually in effect. The flow is concrete: fleet inventory finds where Codex runs and how it is configured; on agents that expose a hook, Anomity returns an allow, deny, or log decision at the hook on each tool call before it runs; and every decision is written to a queryable 90-day audit trail and routed to your SIEM, Slack, email, or Jira. It collects metadata only, with secret redaction on the endpoint, and is SOC 2 Type II. See how it works for the deployment shape and the comparison for where it sits next to Network, EDR, DLP, and GRC.

Anomity does not replace Codex's own sandbox and approval enforcement - it gives security a fleet-wide picture of it, so the conservative baseline you chose once becomes a policy you can confirm everywhere and prove on demand. To verify this layer instead of guessing at it, request early access, or frame the program first with the agentic AI governance guide.

Frequently asked questions

What is the difference between Codex sandbox mode and approval policy?

They are two separate controls that the official docs deliberately keep apart. Sandbox mode is what Codex can do technically - where it can write and whether commands can reach the network - when it executes a model-generated command. Approval policy is when Codex must pause and ask you before it acts, for example before leaving the sandbox, using the network, or running a command outside a trusted set. You set them independently in config.toml as sandbox_mode and approval_policy. A tight sandbox with a permissive approval policy still confines the agent; a loose sandbox with frequent approvals still hands you the decision. Secure setups pair them, rather than relying on either alone.

Which sandbox and approval combination should I use?

Match the pair to intent. For exploring an unfamiliar or untrusted repository, use read-only with on-request so Codex can inspect files but asks before any change. For day-to-day coding, the default workspace-write with on-request lets the agent edit inside the workspace freely and asks before it leaves the sandbox or touches the network. For non-interactive CI that only needs to read, read-only with never runs without prompts and without write or network capability. Reserve danger-full-access, which removes the filesystem and network boundaries entirely, for disposable containers you have isolated yourself.

Does Codex have network access by default?

No. In the workspace-write sandbox, command network access is off unless you explicitly set [sandbox_workspace_write] network_access = true in config.toml. In Codex cloud the picture is phased: setup scripts run with internet access so they can install dependencies, but the agent phase has internet access turned off by default. You can then enable agent internet access per environment with options ranging from a curated common-dependencies allowlist to fully unrestricted. The docs are explicit that enabling agent internet access raises risk, so the safe posture is to leave it off and open only the specific domains and HTTP methods a task genuinely requires.

Why are secrets removed before the Codex agent phase?

In Codex cloud, secrets and environment variables behave differently on purpose. Environment variables are available for the full duration of a task, across both the setup and agent phases. Secrets are stored with an additional layer of encryption, are decrypted only for task execution, and are available only to setup scripts - they are removed before the agent phase starts. The reason is the agent phase is where model-generated commands run and where prompt-injection and exfiltration risk concentrates. Keeping high-value credentials out of that phase means a compromised agent loop has nothing sensitive in its environment to leak. Put credentials a build needs in setup; never assume the agent can still see them.

Are any paths protected even in a writable sandbox?

Yes. Even in workspace-write, the documented protected paths stay read-only: the workspace .git directory and any resolved Git directory pointers, .codex, and .agents. This stops a model-generated command from rewriting Git history, tampering with Codex's own configuration, or editing agent definitions while still being allowed to edit your source. It is a sensible default, but it is a default the local user or a setup script can work around if other controls are loose - which is why version-controlling the repo before delegating and reviewing patch-based changes, both recommended in the docs, matter alongside the sandbox itself.

How do managed configurations constrain Codex for an organization?

Codex supports managed configuration so an administrator can set conservative defaults and constrain security-sensitive settings. A documented conservative baseline is approval_policy = "on-request", sandbox_mode = "workspace-write", and [sandbox_workspace_write] network_access = false, keeping network disabled unless explicitly allowed. Managed configuration can constrain the approval policy, the approvals reviewer, the automatic-review policy, the sandbox mode, permission profiles, web-search mode, managed hooks, and which MCP servers users may enable. The limitation is the same as with any per-machine policy: it only governs endpoints that actually received the managed configuration. A laptop running Codex outside that scope falls back to defaults you did not choose.

Ask AI about Anomity
ChatGPT Claude Perplexity Google AI Grok