← Back to blog Research

OpenAI Codex Full-Access Mode and the Trust Boundary

Anomity Research Security Researcher, Anomity · Jun 9, 2026 · 7 min read

OpenAI Codex ships with a sandbox and an approval prompt, and both can be turned off. The combination that removes them, sandbox_mode = "danger-full-access" with approval_policy = "never", leaves a single control standing between the model and the machine: a one-time decision that you trust the directory and the task. That is a reasonable bet on one developer's laptop for one repo. It is a weak control once you have hundreds of endpoints, and the gap between those two situations is where this analysis lives. We have covered the broader pattern in securing AI coding agents and CLIs; this post drills into one mode and what it actually removes.

The OpenAI Codex docs are clear about what full access means. The point is not that the mode is a bug. It is that "trust the repo and the task" is a human judgment made once, at a point in time, and the agent then operates with that trust across a session whose contents neither the human nor the platform fully control. At fleet scale, that judgment does not compose. Knowing which endpoints run with the sandbox disabled, and what those sessions did, is exactly the fleet inventory and runtime governance problem Anomity exists to solve.

What Codex actually enforces, by mode

Codex separates two controls, and reading them together is the only way to reason about risk. The sandbox decides what the agent can technically do; the approval policy decides when it must stop and ask. Per the OpenAI Codex sandboxing docs, the sandbox is enforced by the OS itself: Seatbelt on macOS, bubblewrap with seccomp and Landlock on Linux and WSL2, and the native sandbox on Windows. Network access is off by default inside the sandbox, and protected paths such as .git stay read-only even when the workspace is writable.

Mode	Filesystem	Network	Asks first?
read-only	Read and list only	Off	Yes, to mutate
workspace-write (default)	Write inside workspace	Off by default	Yes, to leave workspace
danger-full-access + never	Whole machine	On	No

The default, workspace-write, is the well-behaved case: edits land in the working directory, the agent asks before reaching the internet or stepping outside the workspace, and the OS enforces the boundary if the model ignores instructions. The Codex agent approvals docs also describe approval policies including untrusted, on-request, and never, plus an optional auto-review reviewer that screens actions for exfiltration and destructive operations. None of those layers exist in full access. As we noted in securing AI coding agents and CLIs, the default posture is genuinely defensible. The danger is the override, not the tool.

What danger-full-access removes

Run Codex with --dangerously-bypass-approvals-and-sandbox (aliased --yolo), or set the full-access mode in config, and the OS-level sandbox and the approval prompt both come off at once. The agent can now read and write anywhere the user can, execute arbitrary commands, and open outbound network connections without asking. The Codex docs are explicit that this is intended for an externally hardened environment such as a CI runner or a dedicated VM, not a working laptop.

This matters because a coding agent's input is not just your task. It is the repository, its dependencies, build output, MCP tool responses, issue text, and anything else that lands in context. OpenAI's own trust prompt says it plainly: working with untrusted contents carries a higher risk of prompt injection. With the sandbox up, an injected instruction that says "exfiltrate the environment" hits a wall, because the network is off and the filesystem is fenced. With full access, the same instruction meets no wall. The OpenAI Codex docs warn directly that under full access a malicious project can exfiltrate anything available in the environment, including Codex credentials. We walked through one concrete chain in the Codex branch-name command injection that leaked a GitHub token, and the multi-agent version in comment-and-control prompt injection and credential theft.

Why 'trust the repo and the task' is a weak control at scale

On startup Codex detects whether a folder is version-controlled and asks whether you trust its contents, persisting the decision in config.toml under a [projects."/path"] entry with trust_level = "trusted". That is a sensible local affordance. As a fleet control it has three structural problems.

It is a point-in-time judgment over a moving target. You trust a repo today; tomorrow it pulls a new dependency, a new MCP server, or a teammate's branch. The trust decision does not re-evaluate, but the contents that flow into the agent's context do.
The human cannot see what they are trusting. Transitive dependencies, generated files, and tool outputs are exactly the surfaces prompt injection rides in on. "Do you trust this directory" is answered by a person who has not read all of it, and could not.
It does not aggregate. One developer's reasonable yes is invisible to security. Multiply by every endpoint and you have an unknown number of machines running an agent with no sandbox and no approvals, and no central record of which ones or what they did.

This is the same blind spot Anomity treats as the core problem across the agent layer: a control that is sound for one person making one decision degrades into guesswork at the org level. The fix is not to forbid full access, which has legitimate uses in hardened CI. It is to make the configuration and its consequences visible and governable rather than self-attested per machine.

What to check across a fleet

If teams use Codex, a few concrete questions separate a known posture from an assumed one. Each maps to something you can inventory or observe rather than ask people to remember.

Which endpoints have a config.toml setting danger-full-access or a profile with the sandbox removed, and which invocations pass --dangerously-bypass-approvals-and-sandbox or --yolo?
Where is approval_policy = "never" set, and is it paired with full access, the combination that removes the last prompt?
Which projects have network.enabled = true or a trust_level = "trusted" entry, and were those repos vetted, including their dependencies and MCP servers?
When a full-access session runs, is there a record of the commands and tool calls it made, or does the trail end at the developer's terminal?

The first three are inventory questions about static configuration. The fourth is a runtime question, and it is the hardest to answer with the tool alone, because once approvals are off there is no prompt to log and the model's actions go straight to the OS. This is the gap that hook-level observation closes: a record of what the agent tried, independent of whether the agent asked.

# A full-access profile: no sandbox, no prompts, network on.
# Defensible inside a hardened CI runner. A liability on a laptop.
[permissions.ci]
sandbox = "danger-full-access"
approval_policy = "never"

[permissions.ci.network]
enabled = true

[projects."/repos/service-api"]
trust_level = "trusted"

Where hook-level governance and audit fit

Anomity does not replace the Codex sandbox; it complements it the way Network, EDR, and DLP controls complement each other. It runs on the managed endpoint and inventories the eight artifact types that make up the agent layer, AI agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs, then classifies them. That inventory turns "which machines run Codex with full access" from a survey into a query, and surfaces the trusted projects and network-enabled profiles alongside it on the fleet inventory.

For agents that expose a hook, Anomity evaluates each tool call before it runs and returns allow, deny, or log, the runtime governance layer that does not depend on a model choosing to prompt. Coverage tracks what each agent exposes; where a hook is present, a denied action is stopped and a logged one is recorded regardless of the local approval policy. Every decision feeds a queryable 90-day audit trail and routes to SIEM, Slack, email, or Jira, so the fourth question above has an answer that outlives the terminal session. Anomity collects metadata only and redacts secrets on the endpoint, which matters precisely when the sessions you most need to watch are the ones with no sandbox between the agent and your credentials.

Full access is a legitimate mode with a narrow safe home. The risk is not the flag; it is that, at fleet scale, no one knows where it is set or what it did. Anomity makes that layer visible, governable at the hook, and auditable after the fact. If your teams run Codex with the sandbox off, request early access and see the agent layer you are currently trusting on faith.