Now in early access, book a 30-minute demo →
← Back to blog AdvisoryHigh

OpenAI Codex Branch-Name Command Injection and GitHub Token Theft - BeyondTrust Phantom Labs

AI Agent & CLI Security·High·No CVE assigned; reported via BugCrowd·
Affected OpenAI Codex cloud environment, Codex CLI, Codex SDK, Codex IDE Extension (pre-January 2026 fix)

BeyondTrust Phantom Labs disclosed a critical command injection flaw in OpenAI Codex in which the coding agent ran shell commands taken from an attacker-controlled Git branch name during repository setup, and could capture the developer's GitHub user access token in plaintext. No CVE was assigned; it was reported through BugCrowd in December 2025 and publicly disclosed on March 30, 2026. This advisory covers what was disclosed, why a branch name lands squarely on the agentic-endpoint layer, and how to inventory and govern this class of agent-triggered command execution.

What happened

When a Codex user ran a prompt against a GitHub repository, OpenAI Codex set up an environment container and cloned the target repository first. BeyondTrust Phantom Labs found that Codex did not sanitize the branch name before using it in a shell context during that clone, so a crafted branch name executed arbitrary commands during environment setup. Because the GitHub user access token was passed to the container in plaintext for the clone, the injected commands could read and exfiltrate it. OpenAI rated the issue critical.

To get a hostile branch past a human reviewer, the researchers hid the payload behind 94 U+3000 Ideographic Space characters followed by || true and then the real commands. Bash ignores the U+3000 run as whitespace while the GitHub web UI rendered the branch as a benign main, so a reviewer saw nothing suspicious and the shell still ran the trailing commands. The flaw affected the ChatGPT website, the Codex CLI, the Codex SDK, and the Codex IDE Extension.

BeyondTrust Phantom Labs reported the issue to OpenAI through BugCrowd in December 2025. OpenAI shipped an initial fix within a week and completed remediation of the branch shell escape in January 2026. No CVE was assigned, and public disclosure followed on March 30, 2026. The root cause is the familiar one for this cluster: repository metadata an agent treats as trusted, in this case the branch name, crossed into a shell context without sanitization.

Why this is an agentic-endpoint risk

The attack surface here is not a server or a malicious dependency - it is what a coding agent does on the endpoint when it acts on attacker-controlled repository metadata. A branch name is data, not code, but Codex treated it as part of a shell command while doing the most ordinary thing a coding agent does: cloning a repository to start a task. No phishing and no privilege escalation were needed - pointing Codex at a repository whose branch carried a hidden payload was enough to run commands and take the GitHub token.

This exposure is hard to see from the controls you already run, because it lives in the AI artifact layer. The Codex process looks legitimate to EDR; the clone looks like ordinary Git traffic to the network; and DLP sees nothing at rest because the token moves through a live shell, not a file. AI agents and CLIs are part of the eight AI artifact types Anomity tracks per endpoint, adopted bottom-up the same way AI agents and CLIs became the new shadow IT. The question is not whether one developer's Codex is patched; it is which endpoints run the Codex CLI, Codex SDK, or Codex IDE Extension, and what commands those agents ran while setting up repositories - which you cannot answer without an inventory of the artifact layer and a record of the tool calls those agents made.

How Anomity surfaces and governs it

OpenAI's January 2026 remediation closes the branch shell escape, but the durable control is to treat repository metadata as untrusted input and govern what an agent is allowed to do when it acts on that input. Anomity does that in three steps.

First, inventory. Anomity inventories CLIs and AI agents on every managed endpoint as part of the eight AI artifact types it tracks, then classifies them. It captures which surfaces are present - Codex CLI, the Codex SDK, and the Codex IDE Extension - so you can find every endpoint that can clone a repository on a developer's behalf. Anomity collects metadata only; secrets such as the GitHub user access token are redacted on the endpoint before anything leaves it, so the artifact that this flaw targeted is never centralized in plaintext.

Second, decide at the hook. On agents that expose a hook - for example, the Claude Code PreToolUse event - Anomity evaluates each tool call against your policy and returns allow, deny, or log before the call runs. A shell invocation spawned from a branch name during setup can be denied at the boundary, which is exactly what runtime governance provides while a vulnerable agent build is still being rolled forward. This is the same untrusted-input-crosses-a-trust-boundary class seen in the sibling Claude Code project-file RCE and token exfiltration advisory, where repository-supplied configuration ran before consent.

Third, keep the record. Anomity logs the tool calls an agent makes, so command execution triggered by attacker-controlled repository metadata is recorded against a queryable 90-day audit trail, and decisions route to SIEM, Slack, email, or Jira. When a disclosure like the Codex branch-name flaw lands, you can answer which endpoints ran the affected Codex surfaces, which repository setups spawned shell commands, and what those agents were allowed to do - from a record, not a guess. Anomity complements your Network, EDR, DLP, and GRC tooling; it covers the artifact layer those were never built to inventory.

You can't govern what you can't see.The Anomity principle

What to check across your fleet

  • Inventory every endpoint that runs OpenAI Codex, record which surface is present - Codex CLI, Codex SDK, or Codex IDE Extension - and confirm each carries the January 2026 branch shell-escape remediation, including self-hosted or scripted SDK use that clones repositories.
  • Treat repository metadata - branch names, tags, submodule paths, and remote URLs - as untrusted input wherever an agent uses it in a shell context, not just for Codex.
  • Watch for branch and ref names containing runs of U+3000 Ideographic Space or other invisible Unicode followed by shell operators such as || true, which can hide a payload behind a benign-looking main.
  • Confirm command execution that originates from an agent acting on repository metadata is evaluated at a hook with allow/deny/log, so an injected shell call is stopped before it runs.
  • Verify the GitHub user access token and other agent credentials are redacted on the endpoint and never centralized in plaintext, so a clone-time exfiltration has nothing to read.
  • Verify every tool call and CLI configuration change is written to a 90-day audit trail and routed to your SIEM, so you can answer scope questions when the next coding-agent disclosure lands.
  • Cross-reference this inventory against the sibling Claude Code project-file RCE and token exfiltration advisory to find endpoints exposed to more than one repository-driven execution path.

The Codex branch-name flaw is a reminder that an agent's everyday work - cloning a repository to start a task - is an execution and credential path when repository metadata is treated as trusted. Confirm your Codex surfaces carry the January 2026 fix, then inventory the CLIs and AI agents your endpoints run and govern the tool calls those agents make at the hook. For the full coding-agent attack surface and the disclosures it sits within, see the pillar guide on securing AI coding agents and CLIs. To see Anomity inventory and govern the agent and CLI layer across your fleet, request early access.

Frequently asked questions

What is the OpenAI Codex command injection vulnerability?

BeyondTrust Phantom Labs disclosed on March 30, 2026 a critical command injection flaw in OpenAI Codex. During task setup, Codex cloned a GitHub repository but did not sanitize the branch name before using it in a shell context, so a crafted branch name ran arbitrary commands on the environment container. Because the GitHub user access token was passed in plaintext during the clone, the injected commands could capture it. OpenAI rated the issue critical. No CVE was assigned. It affected the ChatGPT website, Codex CLI, Codex SDK, and the Codex IDE Extension before the January 2026 fix.

How did the 94 ideographic-space trick evade human review?

To hide the payload from a reviewer, researchers prefixed it with 94 U+3000 Ideographic Space characters followed by || true and then the real commands. Bash treats the U+3000 run as whitespace and ignores it, while the GitHub web UI rendered the branch as a benign main. A human glancing at the branch in GitHub saw nothing alarming, yet the shell still executed the trailing commands during Codex's clone and setup. It is a clean example of why a branch name, repository metadata an agent treats as trusted, must be handled as untrusted input on the endpoint.

Was a CVE assigned and when was it fixed?

No CVE was assigned to this issue. BeyondTrust Phantom Labs reported it to OpenAI through BugCrowd in December 2025. OpenAI shipped an initial fix within a week of the report and completed remediation of the branch shell escape in January 2026. The flaw was publicly disclosed on March 30, 2026. Because the fix is server-side for the cloud environment and shipped across the Codex CLI, Codex SDK, and Codex IDE Extension, the durable control beyond the patch is to inventory which Codex surfaces run on which endpoints and to record the tool calls those agents make.

How does Anomity reduce exposure to this class of flaw?

Anomity inventories CLIs and AI agents on every managed endpoint as part of the eight AI artifact types it tracks, so you can find Codex CLI, the Codex SDK, and the Codex IDE Extension across the fleet. On agents that expose a hook, it returns allow, deny, or log on each tool call before it runs, so command execution triggered by attacker-controlled repository metadata such as a branch name can be denied at the boundary. Every decision and configuration change lands in a queryable 90-day audit trail routed to your SIEM, Slack, email, or Jira, so you can answer what an agent actually did.

Ask AI about Anomity
ChatGPT Claude Perplexity Google AI Grok