Now in early access, book a 30-minute demo →
← Back to blog Guide

OpenAI Codex Use Cases: A Practical Guide for Engineering Teams (2026)

TL;DR
  • Codex's documented use cases split cleanly by how much access they need: read-mostly work (PR review, understanding code) versus write-and-run work (refactors, migrations, UI builds, deployment).
  • Codex exposes three sandbox modes (read-only, workspace-write, danger-full-access) and three approval policies (untrusted, on-request, never) - the combination, not either dial alone, defines the blast radius.
  • PR review and codebase understanding run comfortably in read-only; OpenAI says Codex reviews the majority of its own PRs and surfaces only P0/P1 issues.
  • Refactors, migrations, and UI builds need workspace-write, where network is off by default and escalations prompt - good for an attended developer, riskier paired with never in CI.
  • On managed machines an org can enforce a requirements.toml that forbids danger-full-access or approval_policy = "never" - but nothing centrally records which mode actually ran.
  • Anomity inventories Codex across the fleet, returns allow/deny/log at the hook on each tool call, and keeps a queryable 90-day audit trail of every escalation.

Most teams adopting OpenAI Codex use cases start with the wrong question - "is Codex safe?" - when the useful question is "safe to do *what*?" The documented use cases span a wide range, from read-only pull request review to migrations that rewrite an authentication system across dozens of files, and each one implies a different amount of access. OpenAI says Codex reviews the majority of its own pull requests and surfaces hundreds of issues a day; that is a very different access profile from an unattended overnight refactor. This guide walks the documented OpenAI Codex use cases grouped the way an engineering team actually deploys them, maps each to the sandbox and approval settings it needs, and then covers governing Codex across a fleet where every developer sets those dials independently.

The throughline is simple. Codex's safety is not a single property of the tool; it is the product of a use case and a configuration. The same binary that safely answers questions about a codebase in read-only mode can rewrite your repo and reach the network if someone flips two settings. Understanding which use case lives in which configuration is the difference between a controlled rollout and a blind spot - the kind we map for the whole category in securing AI coding agents and CLIs.

We will ground every behavior claim in OpenAI's own Codex documentation, group the use cases by access profile, and finish with the fleet question: who is running which use case, in which mode, against which data. That last part is where the fleet inventory starts to matter more than any single developer's config.

What are the two dials behind every Codex use case?

Before grouping use cases, fix the vocabulary, because the same two dials govern all of them. Codex separates a sandbox mode (what the agent can technically do to the filesystem and network) from an approval policy (when Codex must stop and ask). They are orthogonal - you pair one with the other - so the combination defines the blast radius of any task.

Per OpenAI's sandboxing documentation, the sandbox modes are read-only (Codex can inspect files but cannot edit them or run commands without approval), workspace-write (Codex can read, edit inside the workspace, and run routine local commands inside that boundary, with network off by default), and danger-full-access (no filesystem or network boundary). The approval policies are untrusted (only known-safe operations run automatically), on-request (Codex works inside the sandbox and asks when it needs to go beyond it), and never (no approval interruptions). We unpack the enforcement layer of these in the Codex sandbox and approval model writeup; here they are the axes we map use cases onto.

Use-case familyReads codeEdits filesRuns commandsNeeds network
PR review / understanding codeYesNoRarelyNo
Refactor / migrationYesYesYes (build, test)Sometimes
Build / iterate UIYesYesYes (dev server)Often (preview)
Security scan / remediationYesYes (patch step)Yes (validate)Sometimes
Internal tools / data workYesYesYesOften

Read the table as an access-floor chart. A use case that only reads code has no business running in a mode that lets the agent write and reach the network - and the gap between the two is exactly where a misconfigured ~/.codex/config.toml quietly hands an agent more than the task requires.

How should you configure Codex for PR review and understanding a codebase?

These are the read-mostly use cases, and they are the safest place to start a rollout. OpenAI's documentation describes reviewing GitHub pull requests to "catch regressions and potential issues before human review," and understanding large codebases to "trace request flows, map unfamiliar modules, and find the right files fast." Neither requires editing your repo. OpenAI also notes Codex reviews the vast majority of its own PRs, configured to flag only P0 and P1 issues so the comments stay high-signal.

  • Sandbox: read-only is the correct floor - the agent inspects files and answers questions but cannot edit or run commands without an explicit prompt.
  • Approval: on-request keeps a human in the loop for any escalation; for purely conversational code exploration this rarely fires.
  • Network: off. PR review and code reading do not need it, and leaving it off shrinks the surface a prompt-injected diff could reach.

The subtlety: review is read-mostly, but Codex Security's scanning flow validates likely vulnerabilities in an isolated environment and can propose patches, which crosses into write territory. Keep the *review* step read-only and treat the *patch* step as a separate, write-enabled task with its own approval. A diff is also untrusted input - branch names, commit messages, and PR bodies are attacker-controllable, which is exactly the vector behind the Codex branch-name command injection and GitHub token theft chain.

What changes for refactors, migrations, and building UI?

These are the write-and-run use cases, and they move you off read-only. OpenAI documents refactoring to "remove dead code and modernize legacy patterns without changing behavior," running code migrations to "migrate legacy stacks in controlled checkpoints," and building responsive front-end designs with visual checks. All three need to edit files and run builds or tests, which means workspace-write.

In workspace-write the defaults are deliberately conservative: writes are scoped to the workspace and network is off, and Codex escalates - rather than failing silently - when an operation would edit outside the workspace or reach the internet. You can widen the writable area for a build without dropping to full access by listing roots explicitly:

# ~/.codex/config.toml
sandbox_mode = "workspace-write"
approval_policy = "on-request"

[sandbox_workspace_write]
# widen only what the build needs; do NOT enable network blindly
writable_roots = ["/Users/dev/project/.cache", "/tmp/build"]
network_access = false
ScenarioSandboxApprovalWhy
Attended refactor at the keyboardworkspace-writeon-requestDeveloper reviews each escalation in real time
Multi-file migration in checkpointsworkspace-writeon-requestEach pass lands as a separate reviewable diff
UI build needing a preview URLworkspace-writeon-requestNetwork prompts so the developer approves it knowingly
Unattended cleanup in CIworkspace-writeneverStill fenced by the OS sandbox; no human to prompt
Throwaway isolated containerdanger-full-accessneverOnly when nothing valuable is reachable

The row that earns scrutiny is workspace-write + never: a reasonable CI posture, but indistinguishable from the outside from a quiet, fully-fenced session. Nothing in the terminal announces which row is live, which is why the fleet inventory - not the local config - is where this becomes answerable. For UI work specifically, the preview-deploy step is the first time network access becomes genuinely useful; approve it as a deliberate exception, not a standing default.

How do security scanning and remediation use cases differ?

Security work is where review and write use cases meet in one workflow, so it deserves its own treatment. OpenAI documents scanning code changes for security to "review a pull request or local diff for security regressions" and remediating a vulnerability backlog to "turn reviewed findings into minimal fixes with regression evidence." Codex Security scans connected repositories commit by commit, validates likely issues in an isolated environment to cut false positives, and can propose patches a reviewer inspects.

  1. Scan in read-only - the agent reads diffs and history; no write access is needed to surface findings.
  2. Validate in an isolated environment - reproduction runs where a successful exploit cannot touch anything that matters.
  3. Patch in workspace-write with on-request - the fix is a reviewable diff, and a human approves anything that escalates beyond the workspace.
  4. Record every approved escalation so the remediation has an audit trail, not just a merged commit - see the audit trail outcomes.

The trap is letting one long-running session carry full write-and-network access "so it can do the whole loop." That collapses three different access profiles into the most permissive one. Keep the phases distinct. When an agent processing untrusted findings also holds credentials and network, you are one prompt injection away from the failure mode in the multi-agent comment-and-control credential theft analysis.

What about internal tools, data work, and computer-use automation?

OpenAI's documentation also lists Codex building and updating web apps with live preview URLs, cleaning and querying tabular data, synthesizing feedback into reviewable artifacts, and even driving a computer to click through product flows. These are the highest-access use cases in the catalog - they routinely need to edit files, run commands, and reach the network, and computer-use automation steps outside the repository entirely.

Two practical rules. First, data work should operate on copies - OpenAI's own framing is processing tabular data "without affecting the original" - so a bad transform never corrupts source data. Second, these are the use cases where danger-full-access is most tempting and least justified on a developer's primary machine; if a task genuinely needs unrestricted access, run it in a disposable environment where nothing valuable is reachable. The how Codex governance works view treats these as the cases to watch most closely, because they sit closest to real production data and credentials.

Can an organization enforce Codex settings centrally?

Partly. On managed machines, OpenAI's configuration documentation describes a requirements.toml an organization can use to enforce constraints - for example, disallowing approval_policy = "never" or sandbox_mode = "danger-full-access". That is a real floor: it prevents the most dangerous local combinations from being set at all. Pair it with a reviewed default config.toml and you have a sane baseline.

What requirements.toml cannot do is observe. It sets what is *allowed*; it does not record what *ran*. It will not tell you how many endpoints ran a workspace-write migration with network approved this week, whether someone widened a writable root to include a secrets directory, or what the agent actually did inside the boundary it was granted. Enforcement is per-endpoint and per-invocation; the fleet posture - who ran which use case, in which mode, against which data - is unowned by default. That is the same gap we trace across tools in how Claude Code, Codex, and Cursor permission models compare.

How Anomity governs OpenAI Codex

Anomity treats the use case and its configuration as facts to inventory, not choices to hope are set correctly. On every managed endpoint it discovers and classifies eight AI artifact types - AI agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs - so a Codex install, its sandbox mode, and its approval policy become queryable across the fleet rather than a per-developer setting buried in ~/.codex/config.toml (fleet inventory).

On agents that expose a hook, Anomity returns allow, deny, or log on each tool call *before it runs*. That turns the in-session approval prompt - the one only the developer at the keyboard sees - into an org-wide policy: a PR-review session can be held to read-only behavior, a migration's network escalation can require a deny-by-default with an explicit exception, and a danger-full-access invocation can be blocked outright (allow/deny/log at the hook). Anomity collects metadata only, with secret redaction on the endpoint, so enforcing this does not mean shipping your source or credentials anywhere.

Every one of those decisions lands in a queryable 90-day audit trail and routes to your SIEM, Slack, email, or Jira. Concretely: when a developer approved network access during a UI preview deploy on March 3rd, you can pull that record in June - which endpoint, which tool call, which decision - instead of relying on a prompt that vanished with the terminal session (audit trail outcomes). Anomity complements your Network, EDR, DLP, and GRC stack at the AI-agent layer those tools do not see; it does not replace them, as the comparison view lays out.

OpenAI Codex use cases are genuinely productive across review, refactoring, migration, security, and internal tooling - and the sandbox gives you real enforcement primitives for each. The work that remains is making those per-endpoint choices visible and governable at fleet scale. If that gap is yours to close, the agentic AI governance guide lays out the framework, and you can request early access to put it on every endpoint running Codex.

Frequently asked questions

What are the main OpenAI Codex use cases for engineering teams?

OpenAI's documentation groups Codex use cases into a few families: reviewing pull requests and catching regressions before human review, understanding large codebases by tracing request flows and mapping unfamiliar modules, refactoring to remove dead code and modernize legacy patterns, running code migrations in controlled checkpoints, building and iterating on front-end UI, scanning code changes for security regressions and remediating a vulnerability backlog, and building internal tools and data workflows. Each family implies a different access profile - read-mostly work needs far less than tasks that edit files and run commands.

Which Codex sandbox mode should I use for code review versus refactoring?

Pull request review and codebase understanding are read-mostly tasks, so read-only is the right floor - Codex can inspect files and answer questions but cannot edit or run commands without approval. Refactors, migrations, and UI work need workspace-write, where Codex can read, edit inside the workspace, and run routine local commands, with network off by default and escalations prompting for approval. Reserve danger-full-access for genuinely isolated environments; pairing it with approval_policy = "never" removes both filesystem and network boundaries entirely.

Does Codex have network access while it works?

Not by default in the local sandbox. In workspace-write, filesystem writes are scoped to the workspace and network access is off; Codex asks before reaching the internet or going beyond the workspace boundary. In Codex cloud environments, execution splits into a setup phase that runs with internet enabled so dependencies install, and an agent phase that runs with internet off by default unless you configure otherwise. Only danger-full-access removes the network boundary outright. You can widen specific writable roots without dropping the whole sandbox.

How does Codex pull request review actually work?

Per OpenAI's documentation, Codex can review GitHub pull requests like a teammate, posting comments to catch regressions and potential issues before human review begins. OpenAI states Codex reviews the majority of its own PRs and surfaces hundreds of issues a day, configured to flag only P0 and P1 problems so the signal stays high. Codex Security extends this to commit-by-commit scanning that validates likely vulnerabilities in an isolated environment before surfacing them, which reduces false positives. Review itself is read-mostly; the patching step is where write access enters.

Can an organization restrict which Codex modes developers use?

Yes, on managed machines. OpenAI's configuration docs describe a requirements.toml an organization can use to enforce constraints - for example, disallowing approval_policy = "never" or sandbox_mode = "danger-full-access". That sets a floor on the local configuration. What it does not do is tell you which mode each endpoint actually ran, whether someone widened a writable root, or what the agent did inside the boundary it was granted. Enforcing the config and observing the fleet are separate problems; the first is necessary but not sufficient.

What should I check before letting Codex run a large migration?

Confirm the sandbox is workspace-write rather than danger-full-access, and decide whether the run is attended (on-request) or unattended in CI (never inside the sandbox boundary). Scope writable roots to exactly the directories the build needs instead of opening full access. Check that no live secrets sit in the shell or workspace the agent is writing in, since local runs do not get the cloud secret-strip behavior. Run the migration in reviewable checkpoints so each pass is a separate diff. Finally, make sure escalations and tool calls are logged somewhere durable, not just answered in one terminal.

Ask AI about Anomity
ChatGPT Claude Perplexity Google AI Grok