Now in early access, book a 30-minute demo →
← Back to blog AdvisoryHigh

MCP Tool Poisoning via Hidden Instructions in Tool Descriptions - Invariant Labs Disclosure

MCP Server Security·High·Tool Poisoning Attack (Invariant Labs); MCPTox benchmark·
Affected Any MCP client that injects untrusted server tool descriptions into the model context (demonstrated against Cursor)

On April 1, 2025, Invariant Labs disclosed a Tool Poisoning Attack against the Model Context Protocol: a malicious MCP server defines a tool whose description and parameter metadata carry hidden prompt-injection instructions, and the MCP client injects that text into the model context during enumeration. This advisory covers what was disclosed, why it lands on the agentic-endpoint layer, and how to govern it - because unlike the transport-layer MCP bugs, there is no command to block, only a model to steer.

What happened

Invariant Labs (Tool Poisoning Attack) showed that MCP clients inject all tool descriptions from every connected server into the model context window when they enumerate tools. A malicious server can therefore place prompt-injection instructions inside a tool's description field and its parameter metadata. The client UI shows the user only a truncated, innocuous summary, so the steering text is never directly visible - but the model reads the full description when it plans. Critically, the tool does not even need to be invoked: loading it into context is enough.

Invariant demonstrated the technique against Cursor in April 2025, steering the agent through a connected server's poisoned description without the user seeing the malicious text. Microsoft later documented the same behavior as an indirect prompt-injection vector for MCP, and the MCPTox benchmark, published in August 2025, formalized poisoned tool descriptions as a primary attack template. A separate 2025 academic study that analyzed 1,899 open-source MCP servers found about 5.5% carried this kind of poisoned metadata, so the exposure spans real, reachable servers.

This is a content-layer attack, and that sets it apart from the rest of the 2026 MCP wave. The transport- and command-layer issues - the STDIO by-design command execution class and bugs like CVE-2025-6514 in mcp-remote - turn on untrusted strings reaching a subprocess. Tool poisoning injects nothing and spawns nothing; it manipulates the model's plan through metadata the server contributes. It complements those CVEs rather than overlaps them; the full surface is mapped in the pillar on MCP server security.

Why this is an agentic-endpoint risk

An MCP server is not a passive dependency. It is code an AI agent connects to, and the metadata it advertises becomes part of the prompt that drives the agent's behavior. With tool poisoning, the dangerous surface is the description itself - text that lands in context the moment the client enumerates tools. A user who approves a server because its visible summary looks harmless has already loaded whatever the server hid in the full description.

This exposure is invisible to the controls you already run, because it lives in the AI artifact layer. The poisoned text sits in a tool definition the client never fully renders; EDR sees a legitimate process talking to a legitimate endpoint; the network sees encrypted agent traffic; and DLP sees nothing at rest. MCP servers are one of the eight AI artifact types adopted bottom-up that report to no security tool - the same dynamic that makes AI agents and MCP servers the new shadow IT. You cannot tell which connected servers are steering your agents without an inventory of the artifact layer.

How Anomity surfaces and governs it

Because the model itself can be steered, the durable defense is not to trust the plan the model produces - it is to inventory the metadata reaching context and gate the tool calls that result. Anomity does that in three steps.

First, inventory. Anomity inventories every MCP server on every managed endpoint as part of the eight AI artifact types it tracks, capturing the metadata - including the tool descriptions and parameter definitions a server contributes to the model context, and the agent it is connected to. It classifies each server by trust signals so a newly added or recently changed server stands out. Metadata only: secrets are redacted on the endpoint before anything leaves it.

Second, decide at the hook. Treating server-supplied metadata as untrusted is necessary but not sufficient, because a poisoned description can steer the model even when the user never sees it. So on agents that expose a hook - for example, the PreToolUse event in Claude Code - Anomity evaluates each tool call against your policy and returns allow, deny, or log before the call runs. A call the model was steered into can be denied at the boundary regardless of how convincing the poisoned plan was - exactly the control runtime governance provides for an attack that targets the model's reasoning rather than the host.

Third, keep the record. Every decision, and every added, changed, or removed MCP server and its tool metadata, lands in a queryable 90-day audit trail, and decisions route to SIEM, Slack, email, or Jira. When research like MCPTox flags a server pattern you run, or a server's descriptions change, you can answer where it is and what it did - from a record, not a guess. Anomity complements your Network, EDR, DLP, and GRC tooling; it covers the artifact layer those tools never inventoried.

You can't govern what you can't see.The Anomity principle

What to check across your fleet

  • Inventory every MCP server connected on every endpoint and capture the full tool descriptions and parameter metadata each server contributes - not just the summary the client UI renders.
  • Treat server-supplied tool metadata as untrusted input, and pin and review the descriptions a client loads so a server cannot silently change what reaches the model context.
  • Flag newly added or recently changed servers, since loading a poisoned description into context is enough to steer an agent - the tool need never be invoked.
  • Cross-reference your inventory against the MCPTox attack templates and the roughly 5.5% poisoned-metadata rate reported by a 2025 academic survey of open-source MCP servers, and treat a match as in-scope.
  • Confirm that tool calls are evaluated at the agent hook with allow/deny/log, so a call the model was steered into is stopped before it runs rather than trusted because the plan looked reasonable.
  • Verify that every decision and every change to a server's tool metadata is written to a 90-day audit trail and routed to your SIEM, so you can scope the next content-layer finding.

Tool poisoning is not a patch-and-forget CVE - it is a content-layer technique any connected server can use, and the model is what gets steered. The way through is to treat tool metadata as untrusted, inventory every server and the descriptions it loads, and govern the resulting tool calls at the hook. For the full attack surface and how the content layer sits alongside the transport- and command-layer CVEs, see the pillar guide on MCP server security. To see Anomity govern the MCP layer across your fleet, request early access.

Frequently asked questions

What is an MCP tool poisoning attack?

Tool poisoning is a content-layer attack disclosed by Invariant Labs on April 1, 2025. A malicious MCP server defines a tool whose description and parameter metadata carry hidden prompt-injection instructions. MCP clients inject every connected server's tool descriptions into the model context during enumeration, so the instructions reach the model even though the client UI shows only a short, innocuous summary. The tool does not have to be invoked - loading it into context is enough, because the agent reads every description when it plans. The result is that any connected server can steer model behavior through fields the user never directly sees.

How is tool poisoning different from the MCP RCE CVEs?

The MCP remote code execution issues live at the transport and command layer: untrusted strings reach a subprocess spawn, as in the STDIO by-design class and CVE-2025-6514 in mcp-remote. Tool poisoning is a content-layer attack. Nothing is spawned and no command is injected; instead, the model's plan is manipulated by text the server supplies in tool metadata. The two are complementary. A server can be free of command-injection bugs and still poison the model through its descriptions, which is why inventory and runtime governance have to cover both the launch command and the metadata a server contributes to context.

Was tool poisoning demonstrated against a real product?

Yes. Invariant Labs demonstrated the technique against Cursor in April 2025, showing that a connected server's poisoned tool description could steer the agent without the user seeing the malicious text. Microsoft later documented the same behavior as an indirect prompt-injection vector for MCP, and the August 2025 MCPTox benchmark formalized poisoned tool descriptions as a primary attack template for evaluating MCP clients. A separate 2025 academic study (Hasan et al.) that analyzed 1,899 open-source MCP servers found about 5.5% carried this kind of poisoned metadata, so the exposure is not theoretical or limited to one client.

How do you defend against poisoned tool descriptions?

Treat all server-supplied tool metadata as untrusted input rather than trusted documentation. Pin and review the tool descriptions a client loads, so a server cannot silently change what reaches the model context. Most importantly, gate the resulting tool calls rather than trusting the model's plan: the model can be steered, so the durable control is policy at the point a tool call runs. Anomity inventories every MCP server and its tool metadata, evaluates each call at the agent hook with allow, deny, or log, and records every decision in a 90-day audit trail.

Ask AI about Anomity
ChatGPT Claude Perplexity Google AI Grok