← Back to blog AdvisoryCritical

LightLLM Unauthenticated Pickle Deserialization RCE in PD WebSockets - CVE-2026-26220

Anomity Research Anomity Threat Research · Feb 16, 2026 · 5 min read

LLM Gateways & Proxies·Critical·CVE-2026-26220 (GHSA, LightLLM issue #1213)·Feb 16, 2026

Affected LightLLM 1.1.0 and prior; no vendor fix at disclosure

In mid-February 2026, CVE-2026-26220 was disclosed against the LightLLM serving engine: an unauthenticated remote code execution flaw in its prefill-decode (PD) disaggregation system, rated CVSS 9.3 and tracked publicly via GHSA and LightLLM issue #1213. It affects LightLLM 1.1.0 and all prior releases, and there was no validated vendor fix at disclosure. This advisory covers what the bug exposes, why a serving-engine compromise is an agentic-endpoint problem, and how Anomity surfaces and governs the agents that route through it.

What happened

LightLLM is a Python LLM inference and serving engine. Its PD disaggregation mode splits the prefill and decode phases of inference across separate processes so they can scale independently, coordinated by a PD master node. To do that coordination, the PD master exposes WebSocket endpoints over the network.

In LightLLM 1.1.0 and earlier, those WebSocket endpoints receive binary frames and pass the data straight to Python's pickle.loads() with no authentication and no validation. Python pickle is not a safe deserialization format: a crafted payload can carry opcodes that execute arbitrary code during unpickling. So a remote attacker who can reach the PD master sends a single crafted binary frame and runs code on the serving host.

A nonce-based authentication check was present in the code, which should have stopped unauthenticated frames. But the nonce defaulted to an empty string, and an empty string is falsy in Python, so the guard condition never evaluated true and the check never actually ran. The result is an unauthenticated path from a network frame to pickle.loads().

Worse, the endpoints are network-exposed by design: the server code asserts against binding to localhost, so the PD master listens on a routable interface rather than loopback. Combined with the project's history of leaving security reports open, operators should treat any reachable PD master as exploitable. The recommended mitigation is to isolate the PD endpoints on a trusted network segment and block external access until a validated fix is available.

Detail	Value
Identifier	CVE-2026-26220 (GHSA, LightLLM issue #1213)
Type	Unauthenticated pickle deserialization RCE (PD master WebSocket)
CVSS	9.3 (Critical)
Affected	LightLLM 1.1.0 and all prior releases
Fixed in	No validated vendor fix at disclosure (2026-02-16)
Root cause	Empty-string (falsy) default nonce disabled the auth check; endpoints exposed by design

Why this is an agentic-endpoint risk

A serving engine rarely sits alone. LightLLM exists because AI agents, CLIs, and developer tooling need a place to send inference traffic. On a managed endpoint, the LightLLM PD process is an AI artifact in its own right, and so are the Claude Code sessions, MCP servers, and command-line agents that point at it. When that process can be turned into arbitrary code execution by anyone on the network, it is the most dangerous AI artifact on the host.

The blast radius runs past the engine itself. Code execution on the serving host gives an attacker the model weights, any provider credentials reachable from that host, and a foothold to pivot toward the agents and pipelines that depend on it. Network and EDR controls can see the WebSocket connection, but not which agents on which endpoints were configured to route inference through the affected PD master, or what those agents were allowed to do once they reached it.

This is the same artifact-layer blind spot we track across the gateway cluster, including the sibling case in LiteLLM pre-auth SQL injection - CVE-2026-42208. The serving engine is one node in a graph of AI artifacts, and you can't govern what you can't see. Fleet-wide inventory of every AI artifact is the precondition for scoping an incident like this one.

How Anomity surfaces and governs it

Anomity inventories eight AI artifact types on every managed endpoint: AI agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs. For CVE-2026-26220 that means the LightLLM process and its version are catalogued alongside the agents and CLIs that route inference through it, so you can answer "which endpoints run an affected LightLLM build, and what talks to it" from the fleet inventory instead of guessing.

On agents that expose a hook, such as Claude Code PreToolUse, Anomity returns allow, deny, or log on each tool call before it runs. That is the enforcement point in runtime governance: a tool call that routes to a known-vulnerable PD master, or that reaches a serving endpoint outside policy, can be denied or logged in line rather than discovered after the fact. Anomity collects metadata only and redacts secrets on the endpoint, so it never has to read the credentials a compromised host might expose.

Every decision is written to a queryable 90-day audit trail. After a disclosure like this, that trail is what lets responders scope the event: which agents called through the engine, when, and what each call was allowed to do. Anomity routes those decisions to SIEM, Slack, email, or Jira so the right team sees them in the tool they already use, the outcomes record responders rely on.

Anomity complements your existing Network, EDR, DLP, and GRC controls rather than replacing them. It adds the agentic-endpoint layer those tools cannot see. See how it works and how Anomity compares for where it fits.

What to check across your fleet

Identify every endpoint and service running LightLLM and record the exact version; treat 1.1.0 and any prior release as affected.
Determine whether PD (prefill-decode) disaggregation is enabled and whether the PD master WebSocket endpoints are reachable from untrusted networks.
Isolate the PD endpoints on a trusted network segment and block external access until a validated fix is available, since there was no vendor fix at disclosure.
Treat any PD master that has been reachable from an untrusted network as a potential compromise and scope it for code execution, not just patching.
Rotate model-provider credentials and any secrets reachable from the LightLLM serving host, since RCE exposes everything that host can read.
Review host and network logs for unexpected WebSocket frames to the PD master and for child processes spawned by the LightLLM process.
Enumerate which AI agents, CLIs, and MCP servers were configured to route inference through the affected engine, using a fleet-wide AI artifact inventory.
Confirm hook-based allow/deny/log enforcement is active on agents that route inference, so calls to a vulnerable PD master can be blocked.

CVE-2026-26220 turns one reachable serving host into unauthenticated code execution, which is exactly why the AI artifact layer needs its own inventory and enforcement. For the full cluster context, see the pillar on securing LLM gateways and proxies. To see Anomity inventory your agents, govern tool calls at the hook, and keep a 90-day audit trail, request early access.

Frequently asked questions

What exactly makes CVE-2026-26220 exploitable without authentication?

LightLLM's prefill-decode (PD) master node exposes WebSocket endpoints that receive binary frames and pass the raw bytes straight to pickle.loads(). Python's pickle format can carry executable opcodes, so a crafted frame runs arbitrary code on the host. A nonce-based authentication check did exist, but the nonce defaulted to an empty string, which is falsy in Python, so the guard never ran. With the check effectively disabled and the endpoints network-exposed by design, any attacker who can reach the PD master can send a payload and execute code. There is no valid credential to bypass because no check is enforced.

Is there a patched LightLLM version I can upgrade to?

At disclosure in mid-February 2026, there was no validated vendor fix. LightLLM 1.1.0 and all prior releases are affected, and the project had a history of leaving security reports open, so you should not wait for a release to act. The practical mitigation is to isolate the PD endpoints on a trusted network segment and block all external access to the PD master until a fix you have verified is available. Treat any PD master that is currently reachable from an untrusted network as already exploitable, and scope it as a potential compromise rather than a pending upgrade.

Why are the PD endpoints exposed to the network instead of bound to localhost?

Disaggregated prefill-decode serving splits the two phases of inference across separate processes or hosts so they can scale independently, which means the PD master has to coordinate workers over the network. The LightLLM code even asserts against binding to localhost, so the endpoints are network-exposed by design rather than by misconfiguration. That design choice is reasonable for distributed serving, but combined with an unauthenticated pickle.loads() sink it removes the one boundary that would have contained the flaw. The exposure is structural, which is why network isolation is the only reliable mitigation today.

How does Anomity help when an LLM serving engine like LightLLM is compromised?

Anomity treats the LightLLM serving process as an AI artifact on the endpoint, so it inventories the process, its version, and the local agents and CLIs that route inference through it. On agents that expose a hook, such as Claude Code PreToolUse, Anomity returns allow, deny, or log on each tool call before it runs, so a call routing to a known-vulnerable PD master can be denied or logged in line. Every decision lands in a queryable 90-day audit trail, giving responders the timeline they need to scope an RCE event across the fleet.