← Back to blog AdvisoryHigh

vLLM Hardcoded trust_remote_code Override Enables RCE - CVE-2026-27893

Anomity Research Anomity Threat Research · Mar 27, 2026 · 5 min read

LLM Gateways & Proxies·High·CVE-2026-27893 (GHSA-7972-pg2x-xr59)·Mar 27, 2026

Affected vLLM 0.10.1 through before 0.18.0; fixed in 0.18.0

On March 27, 2026, CVE-2026-27893 (also tracked as GHSA-7972-pg2x-xr59) was published against the vLLM inference server. It is a remote code execution flaw rated CVSS 8.8 that bypasses an operator's explicit security opt-out: vLLM runs remote model code even when the operator started it with --trust-remote-code=False. It affects vLLM 0.10.1 through 0.17.x and is fixed in 0.18.0. This advisory covers what the bug does, why an inference server is part of the agentic-endpoint surface, and how Anomity surfaces and governs the agents that drive it.

What happened

vLLM is a high-throughput inference server that loads and serves large language models, often behind an OpenAI-compatible API so agents, CLIs, and other tooling can send model traffic to a single endpoint. When loading a model, vLLM may instantiate sub-components defined by the model repository. The trust_remote_code setting controls whether vLLM will execute Python shipped inside a model repository, and operators disable it with --trust-remote-code=False precisely to refuse that execution.

In the affected versions, several model implementation files hardcode trust_remote_code=True when loading a model's sub-components. That hardcoded value directly overrides the operator's configuration, so the --trust-remote-code=False opt-out is silently discarded for those code paths. As a result, a malicious model repository can execute arbitrary code on the inference server when its sub-components are loaded, even though the operator deliberately disabled remote-code trust.

Exploitation requires inducing a target to load a model from an attacker-controlled or compromised repository. An attacker who publishes a malicious repository targeting an affected architecture can achieve arbitrary code execution the moment vLLM loads it. This is a model-loading bug, not a network bug on the serving path, so the trigger is loading an untrusted artifact rather than a request to the running server.

The fix in vLLM 0.18.0 removes the hardcoded override so the operator's trust_remote_code setting is honored again. Notably, this is the third trust_remote_code bypass in vLLM, following CVE-2025-66448 and CVE-2026-22807 in different code paths. Each was a distinct route to the same outcome, which marks model loading as a recurring remote-code-execution surface for inference servers.

Detail	Value
Identifier	CVE-2026-27893 (GHSA-7972-pg2x-xr59)
Type	RCE via hardcoded trust_remote_code override (model loading)
CVSS	8.8 (High)
Affected	vLLM 0.10.1 – 0.17.x
Fixed in	0.18.0
Trigger	Loading a model from an attacker-controlled or compromised repository
Lineage	Third trust_remote_code bypass after CVE-2025-66448 and CVE-2026-22807

Why this is an agentic-endpoint risk

An inference server rarely sits alone. On a managed endpoint the vLLM process is an AI artifact in its own right, alongside the Claude Code sessions, MCP servers, and command-line agents that point at it. The model artifacts vLLM loads are effectively code, and CVE-2026-27893 is the case where that code runs against the operator's stated wishes.

The risk is the trust override. An operator who set --trust-remote-code=False reasonably believed remote model code would not run. When the opt-out is ignored, a single untrusted model pull becomes arbitrary code execution on a host that often holds provider keys, internal network reach, and the agents that route through it. Network and EDR controls see the process and its connections, but cannot tell you which endpoints run an affected vLLM build, which agents drive it, or where the loaded model came from.

This is the same artifact-layer blind spot we track across the gateway cluster, including the sibling case in LiteLLM pre-auth SQL injection - CVE-2026-42208. The inference server is one node in a graph of AI artifacts, and you can't govern what you can't see. A fleet-wide inventory of every AI artifact is the precondition for answering which hosts are exposed and what drives them.

How Anomity surfaces and governs it

Anomity inventories eight AI artifact types on every managed endpoint: AI agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs. For CVE-2026-27893 that means the vLLM process and its version are catalogued alongside the agents and CLIs that drive it, so you can answer "which endpoints run an affected vLLM build, and what talks to it" from the fleet inventory instead of guessing.

On agents that expose a hook, such as Claude Code PreToolUse, Anomity returns allow, deny, or log on each tool call before it runs. That is the enforcement point in runtime governance: a tool call that launches a known-vulnerable vLLM build, or that pulls a model from a repository outside policy, can be denied or logged in line rather than discovered after code already ran. Anomity collects metadata only and redacts secrets on the endpoint, so the inventory and decisions never expose the credentials reachable from the inference host.

Every decision is written to a queryable 90-day audit trail. After a model-loading flaw like this, that trail is what lets responders scope the event: which agents drove vLLM, when a model was loaded, where it came from, and what each call was allowed to do. Anomity routes those decisions to SIEM, Slack, email, or Jira so the right team sees them in the tool they already use. The result is the timeline and the enforcement record described under outcomes.

Anomity complements your existing Network, EDR, DLP, and GRC controls rather than replacing them; it adds the agentic-endpoint layer those tools cannot see. See how it works and how Anomity compares for where it fits, and the documentation for deployment detail.

What to check across your fleet

Identify every endpoint and service running vLLM and record the exact version; treat anything from 0.10.1 through 0.17.x as affected.
Upgrade to vLLM 0.18.0 or later, which removes the hardcoded trust_remote_code override so --trust-remote-code=False is honored again.
Do not rely on --trust-remote-code=False as a control on affected builds; confirm the flag is actually enforced only after upgrading.
Inventory every model repository your servers load and pin to known-good, provenance-verified artifacts rather than mutable upstream references.
Treat any model loaded on an affected build from a repository you do not fully control as a potential code-execution event, and review host process and outbound network logs.
Rotate credentials reachable from the inference host, including upstream provider keys and any tokens the vLLM process can read.
Enumerate which AI agents, CLIs, and MCP servers drive vLLM, using a fleet-wide AI artifact inventory.
Confirm hook-based allow/deny/log enforcement is active on agents that load or serve models, so calls that launch a vulnerable build or pull an unapproved model can be blocked.

CVE-2026-27893 turns one untrusted model pull into code execution on the inference host, and it is the third time the same class of bug has surfaced in vLLM, which is exactly why the AI artifact layer needs its own inventory and enforcement. For the full cluster context, see the pillar on securing LLM gateways and proxies. To see Anomity inventory your agents, govern tool calls at the hook, and keep a 90-day audit trail, request early access.

Frequently asked questions

Does upgrading vLLM to 0.18.0 fully resolve CVE-2026-27893?

Upgrading to vLLM 0.18.0 removes the hardcoded trust_remote_code=True from the affected model implementation files, so an operator's --trust-remote-code=False setting is honored again when sub-components load. That closes this specific bypass. It does not retroactively cover any model you already loaded on a vulnerable build (0.10.1 through 0.17.x). If an affected server ever loaded a model from a repository you do not fully control, treat that host as potentially executed-upon: review process and outbound network logs, rotate any credentials reachable from the inference host, and re-verify the provenance of every model artifact pinned in your deployment.

If I set --trust-remote-code=False, why did vLLM still execute remote code?

The opt-out was overridden inside vLLM itself. Several model implementation files hardcoded trust_remote_code=True when loading a model's sub-components, so the value you passed on the command line was discarded for those code paths. When a malicious or compromised model repository was loaded, the sub-component loader fetched and ran attacker-controlled Python regardless of your configuration. The flag worked at the top level but not where it mattered for sub-component loading, which is why disabling remote-code trust did not protect the inference server.

How is CVE-2026-27893 different from the earlier vLLM trust_remote_code CVEs?

It is the third trust_remote_code bypass in vLLM, after CVE-2025-66448 and CVE-2026-22807, each in a different code path. CVE-2025-66448 abused auto_map resolution in a config class; CVE-2026-22807 loaded Hugging Face dynamic modules without gating on the trust flag. CVE-2026-27893 instead hardcodes trust_remote_code=True directly in model implementation files. The common thread is that model loading keeps re-surfacing as a remote-code-execution surface on inference servers, so each individual fix matters less than treating untrusted model artifacts as code on every endpoint that loads them.

How does Anomity help when an inference server like vLLM is the exposure?

Anomity treats the vLLM process as an AI artifact on the endpoint, so it inventories the running server, its version, and the agents and CLIs that drive it. On agents that expose a hook, such as Claude Code PreToolUse, Anomity returns allow, deny, or log on each tool call before it runs, so a call that launches an affected vLLM build or pulls a model from an unapproved repository can be denied or logged in line. Every decision is written to a queryable 90-day audit trail and routed to SIEM, Slack, email, or Jira, giving responders the timeline to scope a model-loading incident.