Securing LLM Gateways and Proxies: The Complete Guide for Security Teams (2026)
- An LLM gateway is the single chokepoint every AI agent and MCP server routes model calls through - it holds every provider key and a running record of every prompt, which is exactly why one CVSS 9.3 flaw in it compromises the whole traffic plane.
- The dominant bug classes in 2026 are SSRF (CVE-2024-6587, CVE-2026-5530, CVE-2026-33626), unauthenticated RCE (CVE-2026-26220, CVE-2026-22778, CVE-2025-62164), privilege escalation (CVE-2026-47101, CVE-2026-47102), and pre-auth SQLi/auth bypass (CVE-2026-42208, CVE-2026-48710) - not prompt injection.
- Exposure is real and large: 175,000 publicly accessible Ollama hosts across 130 countries, and GreyNoise logged 91,403 attack sessions against LLM endpoints between October 2025 and January 2026.
- Exploitation is fast - LiteLLM's pre-auth SQLi (CVE-2026-42208) was exploited within 36 hours of disclosure and LMDeploy's image-loader SSRF (CVE-2026-33626) within 12 hours.
- Network, EDR, and DLP cannot tell you which endpoints run a gateway, what version, or which agents route through it - that requires an inventory of the AI artifact layer.
- The durable control is allow/deny/log at the agent hook plus a 90-day audit trail, so a call to an admin route or an unapproved registry is governed even when the gateway's own auth is bypassed.
The decision in front of most security teams in 2026 is not whether to allow LLM gateways - application teams already stood them up - but how to govern a chokepoint that holds every provider key and a running record of every prompt. Securing LLM gateways is the problem of bringing a fleet of LiteLLM, vLLM, Ollama, LightLLM, and LMDeploy instances under the same controls as the rest of your infrastructure, after they appeared without a security review. The scale is not theoretical: a joint SentinelLABS and Censys investigation found 175,000 publicly accessible Ollama hosts across 130 countries. This guide maps the gateway and proxy attack surface, the vulnerability classes that define it, and the decision framework for governing it - starting with why traditional controls cannot see this layer.
Treat this as the hub for a cluster of specific cases. Each vulnerability class below links to a worked advisory on a real CVE, and the securing LLM gateways inventory problem is the thread that connects them. If your agents also route tool calls and data through MCP servers - and most do - read this alongside the sibling MCP server security guide, which covers the tool-and-data half of the same endpoint.
One framing matters before the details. None of the headline gateway bugs in 2026 are prompt injection. They are classic web and infrastructure flaws - SSRF, unsafe deserialization, SQL injection, broken authorization - living in AI-specific code paths. That is good news and bad news: the bug classes are familiar, but the surface they live on reports to no security tool you already run. The fix is the same as for any unmanaged surface: inventory it, govern its calls, and keep a record. Anchor that in the Anomity AI security framework if you are building a program from scratch.
What is an LLM gateway, and why does it concentrate risk?
An LLM gateway, or LLM proxy, fronts many model providers behind one API. An agent or application sends an OpenAI-format request to the gateway; the gateway authenticates the caller, picks an upstream provider, attaches the real provider key, forwards the call, tracks cost, and logs the prompt. LiteLLM is the most common general-purpose example; vLLM, Ollama, LightLLM, and LMDeploy serve models directly and expose OpenAI-compatible endpoints of their own. From the agent's point of view it is one URL and one key. From the org's point of view it is a chokepoint.
That chokepoint concentrates three things an attacker wants. It holds every upstream provider credential, so one compromise yields keys for OpenAI, Anthropic, Bedrock, Vertex, and the rest. It sees every prompt, which on a busy gateway carries source code, customer records, and secrets. And it brokers every model call for the org, so control of the gateway is control of the agent traffic plane. A flaw that would be merely serious in a single application is org-wide when it lands in the service every agent depends on. This is the layer runtime governance is built to cover.
Gateways are also adopted bottom-up. An application team stands up LiteLLM to centralize model access and budget; a developer runs Ollama to test against a local model; a data-science group deploys vLLM for a fine-tuned model. None of these pass through procurement or a security review, which is precisely the shadow-IT dynamic the AI artifact layer creates. The result is a fleet of high-capability services nobody has a full list of.
What are the main vulnerability classes in LLM gateways?
Four classes account for nearly every serious gateway CVE in 2025-2026. Knowing the class tells you what to look for across products, because the same bug recurs in LiteLLM, vLLM, and Ollama for the same reasons. The table maps each class to its mechanism, impact, and a worked advisory.
| Class | Mechanism | Impact | Worked example |
|---|---|---|---|
| SSRF | Caller-controlled URL or registry value makes the gateway fetch from internal services or cloud metadata | Provider key leak, internal recon, metadata theft | CVE-2024-6587, CVE-2026-5530, CVE-2026-33626 |
| Unauthenticated RCE | Unsafe deserialization (pickle, torch.load) or media parsing in an exposed endpoint | Full code execution, no credentials needed | CVE-2026-26220, CVE-2025-62164, CVE-2026-22778 |
| Privilege escalation | Authorization checked on one path but omitted at key creation or user update | Low-privilege account reaches proxy_admin | CVE-2026-47101, CVE-2026-47102 |
| Pre-auth / auth bypass | SQL injection in key verification, or host-header bypass before login | Key theft and access without valid credentials | CVE-2026-42208, CVE-2026-48710 |
SSRF is the most frequent. In the LiteLLM api_base SSRF (CVE-2024-6587) a caller-set api_base made the proxy forward the real provider key to an attacker-controlled host. In the Ollama model-pull SSRF (CVE-2026-5530) a crafted registry URL on the Model Pull API forced outbound requests, and in the LMDeploy vision image-loader SSRF (CVE-2026-33626) the image loader was steered the same way - and exploited against honeypots within 12 hours of disclosure.
Unauthenticated RCE is the most severe. The LightLLM pickle deserialization RCE (CVE-2026-26220) carried CVSS 9.3, where a broken empty-string nonce disabled the only check guarding a pickle load over PD WebSockets. The vLLM prompt-embeds deserialization flaw (CVE-2025-62164) used torch.load() on attacker data, and the vLLM video-URL pre-auth RCE chain (CVE-2026-22778) reached CVSS 9.8 by leaking a heap address through a video_url and then triggering a JPEG2000 overflow.
Privilege escalation and pre-auth flaws round out the surface. The LiteLLM allowed_routes privesc (CVE-2026-47101) let an internal_user mint a key with admin-only routes, and the LiteLLM /user/update self-promotion (CVE-2026-47102) let an org_admin set their own role to proxy_admin. The LiteLLM pre-auth SQL injection (CVE-2026-42208) read provider keys before authentication, and the Starlette BadHost host-header auth bypass (CVE-2026-48710) broke access control in the framework under FastAPI, vLLM, LiteLLM, and MCP servers at once.
Why can't my existing controls see the gateway layer?
Because the exposure lives in the AI artifact layer, and the malicious step usually looks ordinary. When an internal_user mints an over-scoped key in CVE-2026-47101, EDR sees a legitimate process and the network sees normal API traffic to your own gateway. When a video_url triggers CVE-2026-22778, the network sees an inbound request to a model server it has no reason to flag. DLP sees nothing leave the perimeter until the exfiltration is already underway. The controls are working as designed; the surface is simply outside their model.
The gap is concrete. Network and EDR controls can tell you a process is running and a port is open, but not that the process is LiteLLM 1.83.6 with a pre-auth SQLi, or that three agents and two MCP servers route through it, or that a developer minted a key whose allowed_routes it should never have held. Those facts require an inventory of gateways as artifacts, with versions and relationships - the same gap covered in how Anomity compares to Network, EDR, DLP, and GRC tooling.
This is also why the mass-scanning data matters. The GreyNoise scanning and SSRF campaign (2025-2026) logged 91,403 attack sessions against exposed model endpoints, fingerprinting which hosts answer and which leak credentials. Attackers are building target lists of exposed gateways at scale; if your own inventory is worse than theirs, you are defending blind.
How fast do these gateway vulnerabilities get exploited?
Fast enough that the patch window is not a safe assumption. The LiteLLM pre-auth SQL injection (CVE-2026-42208) was exploited within 36 hours of disclosure. The LMDeploy image-loader SSRF (CVE-2026-33626) was hit against honeypots within 12 hours. The GreyNoise campaign concentrated 80,469 sessions into 11 days. When a gateway is internet-reachable, disclosure-to-exploitation is measured in hours, not weeks.
Worse, some flaws have no vendor fix at disclosure. CVE-2026-26220 had no patch when it was published, and CVE-2026-5530 had no vendor response. For those, patching is not even an option on day one - the only control is governing what the gateway and the agents around it are allowed to do, which is the case for allow/deny/log at the hook rather than waiting on a release.
The other lesson is dependency reach. The Starlette BadHost bypass (CVE-2026-48710) lived in a framework, so a single flaw simultaneously affected FastAPI, vLLM, LiteLLM, and MCP servers - and the LiteLLM MCP-preview RCE (CVE-2026-42271) chained directly with it for unauthenticated RCE. A gateway's exposure is its own code plus everything it depends on.
How do I select and configure an LLM gateway securely?
Selection should weigh the security posture, not only the model coverage and cost features. The questions below separate a gateway you can defend from one you cannot.
- Authentication on every endpoint. Confirm there is no unauthenticated admin, metrics, or debug route - and that the host-header and routing logic cannot be bypassed, as CVE-2026-48710 was.
- Authorization enforced at issuance, not only on the request path. Key creation and user-update endpoints must validate that a grant stays within the creator's permissions; CVE-2026-47101 and CVE-2026-47102 are what happens when they do not.
- SSRF egress controls. Caller-supplied URLs,
api_basevalues, and registry endpoints must be allowlisted; the gateway must not reach internal services or cloud metadata, the core of CVE-2024-6587 and CVE-2026-5530. - No unsafe deserialization on attacker input. Pickle and
torch.load()on request data are the path to RCE in CVE-2026-26220 and CVE-2025-62164; the gateway must avoid them or sandbox them. - A fast, public patch cadence and a security-advisory channel, since disclosure-to-exploit is measured in hours.
- Inventory and audit hooks, so the gateway and every agent that routes through it can be governed at the call and recorded in a 90-day audit trail.
On configuration, the highest-leverage hardening is the same across products: require authentication in front of every endpoint, restrict model pulls and api_base to a short allowlist of trusted hosts, deny the gateway egress to internal segments and metadata endpoints, and never expose a runtime that was only meant to face localhost. The Anomity docs cover wiring the inventory and hook enforcement that makes these enforceable rather than aspirational.
How do the major gateways compare on attack surface?
No gateway is uniquely unsafe; each concentrates a different risk because of what it does. The matrix below is a decision aid for where to focus controls per product, grounded in the 2026 CVE record. It is not a ranking - it is a map of where each one has been bitten.
| Gateway / runtime | Primary role | Observed bug classes | Where to focus controls |
|---|---|---|---|
| LiteLLM | General-purpose multi-provider proxy | SSRF, pre-auth SQLi, privesc, MCP-preview RCE | Key issuance authz, egress allowlist, version patching |
| vLLM | High-throughput model server (OpenAI-compatible) | Deserialization RCE, trust_remote_code RCE, media-parse RCE | Disable trust_remote_code, restrict multimodal inputs, network isolation |
| Ollama | Local-first model runtime | Model-pull SSRF, unauthenticated memory leak | Authentication, registry allowlist, no public exposure |
| LightLLM | PD-disaggregated serving | Unauthenticated pickle deserialization RCE | Block WebSocket exposure, no pickle on input, network isolation |
| LMDeploy | Vision-language model serving | Image-loader SSRF | Egress controls on image/media fetches, version patching |
Two patterns cut across the matrix. First, the local-first runtimes (Ollama, vLLM) are the ones most often exposed by accident, which is why the GreyNoise campaign targeted them and why 175,000 Ollama hosts ended up public. Second, the multimodal and serving features (video, vision, prompt-embeds, PD WebSockets) are where RCE concentrates, because they parse complex attacker-controlled input - and config defaults matter too, as the vLLM trust_remote_code override (CVE-2026-27893) showed by hardcoding trust_remote_code=True past the operator opt-out. Map your fleet against this matrix and you know where the next bug is likely to land - which presupposes you have the fleet inventory to map against.
What about edge cases - internal-only gateways and ephemeral runtimes?
Two edge cases break the "just put it behind the firewall" instinct. The first is the internal-only gateway that an SSRF turns into a pivot. CVE-2024-6587 and the Ollama model-pull SSRF make the gateway itself issue the outbound request, so the attacker reaches internal services and cloud metadata from inside your perimeter even though they never touched those hosts directly. "Internal-only" does not mean "cannot be made to reach inward."
The second is the ephemeral runtime. A vLLM container in CI, an Ollama instance a developer starts for an afternoon, a gateway baked into a base image - these never appear in a static asset list but are live long enough to be scanned and exploited, given a 12-hour exploitation window. Point-in-time network scans miss them; continuous endpoint inventory does not. The same applies to a gateway bound to localhost that an agent reaches but no external scan ever sees.
Both cases share a root cause: the unit of risk is the artifact on the endpoint, not the host on a network diagram. Governing the agent's call - the allow/deny/log decision at the hook - covers the ephemeral and the internal-only case alike, because it does not depend on the gateway being reachable from outside or surviving long enough to land in an asset database.
How Anomity governs LLM gateways and proxies
Anomity treats an LLM gateway as an agentic endpoint artifact and governs it in three concrete steps. The structure matters because the gateway's own authorization can be bypassed - by an SSRF, an over-scoped key, or a host-header trick - so the reliable place to enforce policy is the agent's call, not a credential the gateway has been tricked into trusting.
First, inventory. Anomity inventories every LLM gateway and proxy on every managed endpoint as one of eight AI artifact types - alongside AI agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs. It records the exact version of LiteLLM, vLLM, Ollama, LightLLM, or LMDeploy, so finding every install of LiteLLM before 1.83.7 (the CVE-2026-42208 fix) or vLLM before 0.14.1 (the CVE-2026-22778 fix) is a single query rather than a fleet sweep. It captures the relationships - which agents, CLIs, and MCP servers route through each gateway - and classifies each by the capability it brokers. Metadata only: secret values are redacted on the endpoint before anything leaves it.
Second, decide at the hook. On agents that expose a hook - for example the PreToolUse event in Claude Code - Anomity evaluates each tool call against your policy and returns allow, deny, or log before the call runs. An agent reaching an admin-only proxy route, a key-management endpoint, a model-configuration path, or a model pull to a registry outside your allowlist is checked at the call boundary. That means an over-scoped key from CVE-2026-47101, or a pull that would trigger the CVE-2026-5530 SSRF, is governed by your policy even when the gateway's own auth check is the thing that failed. This is the enforcement point runtime governance provides.
Third, keep the record. Every install, version change, and decision lands in a queryable 90-day audit trail, and decisions route to SIEM, Slack, email, or Jira so the right team sees them where they already work. When responders ask which gateways ran a vulnerable build, who minted which key, and what admin routes or registries were reached, you answer from a record rather than a reconstruction. Anomity is SOC 2 Type II and complements - does not replace - your Network, EDR, DLP, and GRC tooling; it adds the agentic-endpoint layer those tools were never built to inventory. See how it works for the end-to-end flow.
You can't govern what you can't see.The Anomity principle
What should security teams do next?
Start with the inventory question, because every other control depends on it. Work through the list below in order; the first three items close the largest share of the exposure described in this guide.
- Inventory every endpoint, container image, and CI runner for LiteLLM, vLLM, Ollama, LightLLM, and LMDeploy, and record the version of each - including runtimes bound to localhost or living only in CI.
- Patch to the fixed releases: LiteLLM 1.83.14+ (privesc), 1.83.10+ (self-promote), 1.83.7+ (SQLi and MCP-preview RCE), 1.44.8+ (SSRF); vLLM 0.18.0+, 0.14.1+, 0.11.1+; Ollama 0.17.1+; LMDeploy 0.12.3+; Starlette 1.0.1+.
- Remove accidental exposure: require authentication in front of every gateway endpoint and take any runtime that was only meant to face localhost off the public internet.
- Lock down egress: allowlist model registries and api_base targets, and deny the gateway access to internal services and cloud metadata endpoints.
- Audit existing keys and roles for grants that exceed the holder's permissions, and rotate any provider credentials reachable from a gateway that ran a vulnerable build.
- Govern agent calls to admin routes, key-management endpoints, and model registries with allow/deny/log at the hook, and write every gateway change and governed call to a 90-day audit trail routed to your SIEM.
If you are formalizing this into a program, the agentic AI governance guide puts the gateway layer in the context of the wider AI artifact estate, and the AI security framework gives the control structure to map it to.
The decision framework for securing LLM gateways comes down to three moves applied in order: inventory every gateway and its version as an artifact on the endpoint, patch fast but assume the window is short and some flaws have no fix, and govern the agent's call at the hook with a 90-day record so policy holds even when the gateway's own auth is bypassed. With 175,000 Ollama hosts already exposed and exploitation measured in hours, the gateway layer is not a future problem. To see Anomity inventory your gateways, govern tool calls at the hook, and keep the audit trail across your fleet, request early access.
Frequently asked questions
What is an LLM gateway and why does it concentrate so much risk?
An LLM gateway, or LLM proxy, is a service that fronts many model providers behind one API. Tools such as LiteLLM, vLLM, Ollama, LightLLM, and LMDeploy let agents and applications call any model through a single endpoint, with key management, cost tracking, and routing handled centrally. That centralization is the point and the problem. The gateway holds every upstream provider credential and brokers every model call in the org, so it sees a running record of prompts that routinely carry source code, customer data, and secrets. Compromise the gateway and you compromise the keys, the prompt history, and the traffic plane every agent depends on - which is why a single flaw in it has org-wide blast radius.
What are the most common vulnerability classes in LLM gateways?
Four classes dominate the 2026 record. Server-side request forgery is the most frequent: a caller-controlled URL or registry value makes the gateway fetch from internal services or cloud metadata, as in CVE-2024-6587, CVE-2026-5530, and CVE-2026-33626. Unauthenticated remote code execution comes from unsafe deserialization and media parsing, as in CVE-2026-26220, CVE-2025-62164, and CVE-2026-22778. Privilege escalation lets a low-privilege account reach admin control, as in CVE-2026-47101 and CVE-2026-47102. Pre-auth flaws such as the SQL injection in CVE-2026-42208 and the host-header auth bypass in CVE-2026-48710 break access control before login. Prompt injection is not on this list - these are classic web and infrastructure bugs in AI-specific code.
How exposed are LLM gateways on the public internet right now?
More exposed than most teams assume. A joint SentinelLABS and Censys investigation found 175,000 publicly accessible Ollama hosts across 130 countries, and separate scans counted roughly 300,000 servers at risk from the Bleeding Llama memory leak (CVE-2026-7482). GreyNoise honeypots logged 91,403 attack sessions against Ollama and OpenAI-compatible endpoints between October 2025 and January 2026, with 80,469 of them in a single 11-day burst. Exposure is rarely deliberate: a developer forwards a port, a container ships with a bound interface, or an internal gateway gets a public load balancer. The first defensive step is knowing which endpoints run a gateway at all, which is an inventory problem before it is a firewall problem.
Why is patching alone not enough for LLM gateway security?
Patching is necessary and you should do it, but two gaps remain. First, speed: LiteLLM's pre-auth SQL injection was exploited within 36 hours of disclosure and LMDeploy's image-loader SSRF within 12 hours, so there is a real window where a known-vulnerable build is still running. Second, coverage: you cannot patch what you have not inventoried, and gateways are stood up bottom-up by application teams without a security review. Some flaws also have no vendor fix at disclosure, like CVE-2026-26220 and CVE-2026-5530. The durable control is governing the call itself - allow, deny, or log at the agent hook - so an agent reaching an admin route or an unapproved registry is checked regardless of whether the gateway is patched.
How do I find every LLM gateway running across my fleet?
You enumerate the AI artifact layer on every managed endpoint, not just the hosts you remember standing up. That means scanning for the gateway process and version (LiteLLM, vLLM, Ollama, LightLLM, LMDeploy), the OpenAI-compatible servers behind them, and the agents, CLIs, and MCP servers configured to route through each one. Network scans find what is listening on a port today; they miss a gateway bound to localhost that an agent reaches, or a container image that ships one. Anomity inventories LLM gateways as one of the eight AI artifact types it tracks per endpoint and records the installed version, so finding everything below a fixed release is a single query rather than a fleet sweep.
Does runtime governance replace my WAF, EDR, or network controls?
No. Anomity complements Network, EDR, DLP, and GRC tooling rather than replacing it. Those tools see the inbound scan, the running process, and the outbound connection, but they cannot tell you which endpoints run a gateway, what version, which agents drive it, or whether a key creation crossed a role boundary inside the application's own auth model. Runtime governance adds the missing layer: it inventories the gateway, decides allow/deny/log on each tool call at the agent hook before it runs, and keeps a queryable 90-day audit trail. The web bug classes here still warrant a WAF and network egress controls; runtime governance covers the agentic-endpoint layer those controls were never built to see.
What is the relationship between LLM gateways and MCP server security?
They are adjacent surfaces that increasingly overlap. An MCP server gives an agent tools and data; an LLM gateway gives it models. Both are AI artifacts that appear on endpoints without review, and the lines are blurring - LiteLLM's CVE-2026-42271 was a command injection in its own MCP-preview endpoints, chained to unauthenticated RCE. An agent that routes model calls through a gateway and tool calls through an MCP server depends on both being inventoried and governed. The same allow/deny/log decision at the hook covers a model-config call to a gateway and a tool call to an MCP server. For the tool-and-data half of the picture, see the MCP server security guide; this guide covers the model-routing half.
How does Anomity help secure LLM gateways specifically?
In three steps. Anomity inventories every LLM gateway and proxy on every managed endpoint as one of eight AI artifact types, surfacing the exact version so you can find every install below a fixed release in one query. On agents that expose a hook, such as Claude Code PreToolUse, it evaluates each tool call against your policy and returns allow, deny, or log before the call runs, so a call to an admin route, a key-management endpoint, or an unapproved model registry is checked at the call boundary even when the gateway's own auth is bypassed. Every install, version change, and decision lands in a queryable 90-day audit trail that routes to SIEM, Slack, email, or Jira. Anomity collects metadata only, with secret redaction on the endpoint, and is SOC 2 Type II.