Now in early access, book a 30-minute demo →
← Back to blog AdvisoryHigh

Mass Scanning and SSRF Campaign Against Exposed LLM Infrastructure - GreyNoise 2025-2026

LLM Gateways & Proxies·High·GreyNoise LLM honeypot campaign (Oct 2025 - Jan 2026)·
Affected Internet-exposed Ollama and OpenAI-compatible LLM endpoints/gateways

Between October 2025 and January 2026, GreyNoise's Ollama honeypot infrastructure captured 91,403 attack sessions against emulated Ollama and OpenAI-compatible LLM endpoints. The traffic was not steady: 80,469 of those sessions landed in the 11 days from December 28, 2025 to January 8, 2026. GreyNoise tracked two distinct campaigns running in parallel and assessed the activity as a professional threat actor conducting reconnaissance. This advisory covers what was observed, why scanning of exposed model infrastructure is an agentic-endpoint problem, and how Anomity surfaces and governs the agents and runtimes that sit behind it.

What happened

The first campaign was mass enumeration. It probed more than 73 LLM model endpoints across OpenAI-compatible and Google Gemini API formats, fingerprinting models such as GPT, Claude, Llama, Gemini, DeepSeek, Mistral, Qwen, and Grok. The aim was to find servers misconfigured to leak access to commercial AI APIs: which exposed hosts answer, what model they front, and which carry usable credentials. The output is a target list, not an immediate breach.

The second campaign was narrower and sharper. Over a 48-hour window it generated 1,688 sessions abusing Ollama's model-pull functionality to exploit server-side request forgery through malicious registry URLs. Because the server makes the outbound request, a crafted registry value lets an attacker steer Ollama to reach internal services and cloud instance metadata endpoints the attacker could not touch directly.

GreyNoise assessed the operator as a professional threat actor building target lists for future exploitation of exposed AI APIs, not opportunistic noise. The recommended defenses are operational: restrict model pulls to trusted registries, block out-of-band callbacks at the DNS layer, and monitor JA4 fingerprints to spot the automated tooling. Separately, attackers scan for misconfigured proxies to ride paid LLM accounts, an abuse class known as LLMjacking, the monetization step this reconnaissance feeds.

DetailValue
IdentifierGreyNoise LLM honeypot campaign (Oct 2025 - Jan 2026)
Total sessions91,403 (Oct 2025 - Jan 2026)
Peak concentration80,469 sessions in 11 days (Dec 28, 2025 - Jan 8, 2026)
Campaign 1Enumeration of 73+ LLM endpoints (OpenAI-compatible + Gemini formats)
Campaign 21,688 sessions over 48h abusing Ollama model-pull SSRF via registry URLs
AssessmentProfessional threat actor building target lists for future exploitation
DefensesTrusted-registry allowlist, block out-of-band DNS, monitor JA4 fingerprints

Why this is an agentic-endpoint risk

An exposed model endpoint rarely sits alone. Ollama and OpenAI-compatible gateways exist because AI agents, CLIs, and developer tooling want a model they can call. On a managed endpoint, the runtime process is an AI artifact in its own right, and so are the Claude Code sessions, MCP servers, and command-line agents that point at it. When an agent can trigger a model pull, it reaches the very API this campaign abused - and when a developer forwards a port to a local runtime, that endpoint joins the internet-facing surface the scanners are sweeping.

That reachability is the risk. Enumeration finds the host; the model-pull SSRF turns a single pull request into a path to internal services and metadata endpoints; LLMjacking turns a leaked credential into compute billed to you. Network and EDR controls see the inbound scan and the outbound connection, but cannot tell you which endpoints run an exposed model runtime, which agents drive it, or whether any were allowed to pull to an unapproved registry. That is the artifact-layer blind spot.

We track the same blind spot across the gateway cluster, including the sibling cases in Ollama model pull API SSRF - CVE-2026-5530, LiteLLM api_base SSRF - CVE-2024-6587, and LiteLLM pre-auth SQL injection - CVE-2026-42208. Each runtime is one node in a graph of AI artifacts, and you can't govern what you can't see. Fleet-wide inventory of every AI artifact is the precondition for knowing what the scanners can find.

How Anomity surfaces and governs it

Anomity inventories eight AI artifact types on every managed endpoint: AI agents, MCP servers, extensions, skills, plugins, secrets, hooks, and CLIs. For this campaign that means the Ollama or OpenAI-compatible process and its version are catalogued alongside the agents and CLIs that drive it, so you can answer "which endpoints expose a model runtime, and what talks to it" from the fleet inventory rather than from someone else's scan results.

On agents that expose a hook, such as Claude Code PreToolUse, Anomity returns allow, deny, or log on each tool call before it runs. That is the enforcement point in runtime governance: a tool call that triggers a model pull to a registry outside an approved allowlist, or that routes inference to an unsanctioned endpoint, can be denied or logged in line rather than found after the forged request reached an internal service. Anomity collects metadata only and redacts secrets on the endpoint, so the very API keys an LLMjacking operator hunts for never pass through Anomity.

Every decision is written to a queryable 90-day audit trail. After a campaign like this, that trail is what lets responders scope the event: which agents initiated a pull, when, what registry each call named, and which endpoints inference was routed to. Anomity routes those decisions to SIEM, Slack, email, or Jira so the right team sees them in the tool they already use. The result is the timeline and the enforcement record described under outcomes.

Anomity complements your existing Network, EDR, DLP, and GRC controls rather than replacing them. It adds the agentic-endpoint layer those tools cannot see. See how it works and how Anomity compares for where it fits.

What to check across your fleet

  • Enumerate every endpoint running Ollama or an OpenAI-compatible model server and record which are reachable from outside the host, including forwarded ports and tunnels.
  • Restrict model pulls to a short allowlist of trusted registries so the pull API cannot be steered to an arbitrary host through a crafted registry URL.
  • Block out-of-band DNS resolution at the resolver layer to cut the callback channel attackers use to confirm and exfiltrate through the SSRF.
  • Require authentication in front of any model endpoint and remove exposure of runtimes never meant to face untrusted callers.
  • Deny the runtime egress to internal services, cloud instance metadata endpoints, and isolated segments it has no business reaching.
  • Review inbound logs for enumeration sweeps against many model-name paths, and monitor JA4 fingerprints to flag the automated tooling.
  • Rotate any commercial AI API keys reachable from an exposed endpoint and watch for billing anomalies that signal LLMjacking.
  • Enumerate which AI agents, CLIs, and MCP servers can trigger a pull or route inference through an exposed runtime, using a fleet-wide inventory.
  • Confirm hook-based allow/deny/log enforcement is active on agents that drive model traffic, so a pull to an unapproved registry can be blocked before it runs.

The GreyNoise campaign turns reconnaissance against exposed model infrastructure into ready target lists, with the model-pull SSRF and LLMjacking close behind - which is why the AI artifact layer needs its own inventory and enforcement. For the full cluster context, see the pillar on securing LLM gateways and proxies. To see Anomity inventory your agents, govern tool calls at the hook, and keep a 90-day audit trail, request early access.

Frequently asked questions

What was the GreyNoise LLM honeypot campaign measuring?

GreyNoise ran honeypot infrastructure that emulated Ollama and OpenAI-compatible LLM servers and recorded who connected to it. Between October 2025 and January 2026 it captured 91,403 attack sessions, with 80,469 of them concentrated in the 11 days from December 28, 2025 to January 8, 2026. The traffic split into two parallel campaigns: a mass enumeration effort probing more than 73 model endpoints to fingerprint exposed servers, and a 48-hour burst of 1,688 sessions abusing Ollama's model-pull functionality to trigger server-side request forgery through malicious registry URLs. GreyNoise assessed the activity as a professional threat actor building target lists.

Is this scanning targeting a specific CVE, or any exposed LLM server?

The enumeration campaign was not tied to a single CVE. It probed more than 73 LLM model endpoints across OpenAI-compatible and Google Gemini API formats, fingerprinting models such as GPT, Claude, Llama, Gemini, DeepSeek, Mistral, Qwen, and Grok to find any server misconfigured to leak access to commercial AI APIs. The goal was reconnaissance: identify which exposed hosts answer, what they front, and which carry usable credentials. The parallel model-pull burst did target a specific abuse class, the SSRF reachable through Ollama's pull API. Treat any internet-reachable Ollama or OpenAI-compatible endpoint as in scope rather than waiting for a CVE match.

What is LLMjacking and how does it relate to this campaign?

LLMjacking is an abuse class where attackers scan for misconfigured proxies and exposed model endpoints, then ride the victim's paid LLM accounts to run inference on someone else's bill. It is the monetization step that the reconnaissance in this campaign feeds. Once a scan fingerprints a server that leaks access to a commercial AI API, the credentials or open proxy behind it can be resold or used directly for free compute. Because the cost lands on the account owner and the traffic looks like normal API usage, LLMjacking often goes unnoticed until a billing spike. Constraining which endpoints agents may reach limits both the discovery and the abuse.

How does Anomity reduce exposure to scanning and model-pull abuse like this?

Anomity treats local model runtimes and gateways as AI artifacts on the endpoint, so it inventories the Ollama or OpenAI-compatible process, its version, and the agents and CLIs that drive it. On agents that expose a hook, such as Claude Code PreToolUse, Anomity returns allow, deny, or log on each tool call before it runs, so a call that triggers a model pull to an unapproved registry or routes to an unsanctioned endpoint can be denied or logged in line. Every decision lands in a queryable 90-day audit trail, giving responders the timeline to scope reconnaissance and SSRF activity across the fleet.

Ask AI about Anomity
ChatGPT Claude Perplexity Google AI Grok