LMDeploy Vision Image-Loader SSRF Exploited in 12 Hours - CVE-2026-33626
CVE-2026-33626 is a server-side request forgery (SSRF) in LMDeploy, the InternLM serving toolkit for vision-language and text models. It is tracked as GHSA-6w67-hwm5-92mq, rated High, and was disclosed on April 21, 2026. All releases before 0.12.3 are affected; the fix is in 0.12.3.
What happened
The load_image() function in lmdeploy/vl/utils.py fetches image URLs supplied to a vision-language endpoint without validating the destination. It does not reject internal or private IP ranges, so any URL an attacker can place into a multimodal request becomes a server-side fetch. That turns the vision image loader - a feature meant to pull a picture for the model to describe - into a generic HTTP SSRF primitive that issues requests from the inference server's network position.
Sysdig observed the first exploitation attempt against honeypots roughly 12 hours and 31 minutes after the advisory published. Over an eight-minute session, attackers port-scanned the internal network and reached AWS IMDS (169.254.169.254), Redis, MySQL, a secondary admin HTTP interface, and an out-of-band DNS exfiltration endpoint. The sequence is a textbook SSRF-to-cloud chain: enumerate the internal network, pull instance credentials, then pivot to data stores and a second control plane.
The stakes are set by where these models run. Vision-LLM nodes typically sit on GPU instances with broad IAM roles for S3 model artifacts and cross-account assume-role. A single successful IMDS fetch returns temporary credentials for that role, so one SSRF hit can compromise the cloud account rather than just the inference process. The case is a clean illustration of how fast disclosed inference-server SSRF flaws are weaponized - and why the patch deadline is effectively the disclosure timestamp.
Why this is an agentic-endpoint risk
LMDeploy is the kind of AI artifact that lands on developer and inference endpoints quietly. It is installed to serve a model, often by an ML team moving fast, and it rarely appears in the asset inventories your network, EDR, and DLP tools were built around. That is exactly the blind spot Anomity inventories: a serving toolkit running on a GPU box, holding cloud credentials, accepting external input. It is shadow AI infrastructure with the same governance gap as an unvetted MCP server.
The artifact layer matters here because the exposure is not in the model weights or the prompt - it is in the software wrapped around the model and the permissions it inherits. An agent or pipeline that calls a vision endpoint can pass a hostile URL downstream without anyone noticing, and the inference node will dutifully fetch it. Anomity treats LMDeploy, the agents that call it, and the credentials it holds as first-class inventory items rather than untracked background processes. See how that maps to the eight artifact types under fleet visibility.
This is the same pattern as the sibling advisory on the LiteLLM pre-auth SQLi (CVE-2026-42208): an LLM-serving component, reachable input, and a flaw that converts a normal feature into a network-facing primitive. The cluster theme is consistent - gateways and proxies in front of models are attack surface, and you cannot reason about them if you do not know they exist on your fleet.
How Anomity surfaces and governs it
First, inventory. Anomity runs on every managed endpoint and classifies the AI artifacts present, including inference toolkits like LMDeploy, the MCP servers and CLIs around them, and the agents that invoke them. That answers the first question after any disclosure: which of my endpoints run the affected software, at what version. Fleet visibility makes the LMDeploy footprint and its version a query, not a fire drill across teams.
Second, runtime governance. On agents that expose a hook - for example, a Claude Code PreToolUse hook - Anomity returns allow / deny / log on each tool call before it runs. When an agent is about to invoke a vision endpoint or fetch a URL on behalf of a model, that call is a decision point Anomity can evaluate and block, rather than a request that silently leaves the host. Runtime governance is where an attacker-supplied URL meets a policy instead of an open socket.
Third, the record. Anomity keeps a queryable 90-day audit trail and collects metadata only, with on-endpoint secret redaction. After an incident - or to prove one did not occur - you can query which agents touched the vision endpoint, what they were allowed or denied, and when. Decisions route to SIEM, Slack, email, and Jira, so the SOC sees them where it already works. The durable audit trail turns a 12-hour exploitation window into something you can actually reconstruct. Anomity is SOC 2 Type II and complements - does not replace - your Network, EDR, DLP, and GRC controls.
What to check across your fleet
- Inventory every endpoint running LMDeploy and record the version; treat anything before 0.12.3 as exposed and upgrade to 0.12.3 or later.
- Identify which inference nodes serve vision-language models and accept image URLs as input - these are the directly exploitable surfaces for
load_image(). - Audit the IAM role on each GPU inference instance; scope it down from broad S3 and cross-account assume-role permissions to the minimum the workload needs.
- Enforce IMDSv2 with a hop limit of 1 so an SSRF cannot trivially read instance credentials from 169.254.169.254.
- Restrict egress from inference nodes so they cannot reach internal data stores (Redis, MySQL), admin interfaces, or arbitrary external DNS endpoints.
- Confirm agents that call vision endpoints pass through a hook where allow/deny/log can evaluate the URL before the fetch executes.
- Query the 90-day audit trail for any prior agent activity against vision endpoints during the post-disclosure window.
For the full picture of how gateways and proxies in front of models become attack surface - and how to inventory and govern them - see the pillar on securing LLM gateways and proxies, the How It Works walkthrough, and the agentic AI governance guide. If you want LMDeploy and every other AI artifact on your fleet inventoried, governed at the hook, and recorded for 90 days, request early access.
Frequently asked questions
What is CVE-2026-33626 and which versions are affected?
CVE-2026-33626 (GHSA-6w67-hwm5-92mq) is a server-side request forgery in LMDeploy, the InternLM serving toolkit for vision-language and text models. The load_image() function in lmdeploy/vl/utils.py fetches arbitrary URLs without validating internal or private IP addresses, turning the vision image loader into a generic HTTP SSRF primitive. An attacker who can submit an image URL to a vision-language endpoint can force the server to make requests to internal services. All releases before 0.12.3 are affected; the fix landed in 0.12.3. Upgrade to 0.12.3 or later.
How quickly was it exploited?
Sysdig observed the first exploitation attempt against honeypots roughly 12 hours and 31 minutes after the advisory published on April 21, 2026. Over an eight-minute session, attackers port-scanned the internal network and hit AWS IMDS, Redis, MySQL, a secondary admin HTTP interface, and an out-of-band DNS exfiltration endpoint. The window between disclosure and weaponization for inference-server SSRF flaws is now measured in hours, not weeks - which is why fleet-wide inventory of which endpoints even run LMDeploy is the prerequisite for any timely response.
Why is an SSRF on a GPU inference node so dangerous?
Vision-LLM nodes typically run on GPU instances with broad IAM roles for S3 model artifacts and cross-account assume-role. A single successful IMDS fetch (169.254.169.254) returns temporary credentials scoped to that role, so one SSRF hit can compromise the cloud account, not just the inference process. The honeypot session showed attackers reaching exactly these targets: IMDS for credentials, then Redis and MySQL for data, then a second admin interface for lateral movement. The blast radius is the IAM role, not the model server.
Does upgrading to 0.12.3 fully resolve the risk?
Upgrading to 0.12.3 fixes this specific URL-fetch flaw, and you should patch every affected node. But the underlying exposure - an AI inference component making outbound requests on attacker-influenced input - recurs across serving toolkits and gateways. Patching one CVE does not give you a standing inventory of which endpoints run which inference software, what IAM roles they hold, or a record of what they did. Anomity provides the durable layer: continuous inventory, runtime allow/deny/log on agent tool calls, and a queryable 90-day audit trail.