IBM AI Risk Atlas for Agentic AI: A Catalogue of Agent Risks (2026)
- The IBM AI Risk Atlas is a taxonomy of AI risks organized by where each risk originates - Input (training data), Inference, Output, and Non-Technical - with a dedicated set of agentic risks for autonomous AI agents.
- It is one of the few major frameworks to enumerate agentic risks (tool use, autonomy, identity/trust) as first-class, individually named items rather than folding them into generic LLM risk.
- It is operationalized inside IBM watsonx.governance - driving out-of-box risk-identification assessments and the new Model Risk Evaluation Engine - and mapped to NIST AI RMF, the OWASP Top 10s, and the EU AI Act via the open-source Risk Atlas Nexus project.
- The Atlas asserts the *need* for traceable, accountable agent behavior; it does not, by itself, discover the agents and MCP servers running on your fleet.
- To score real systems against the Atlas, security teams need continuous, runtime visibility into which agents exist, what tools and MCP servers they call, and with what permissions.
Most AI risk frameworks were written for a world of models you deploy and call. The IBM AI Risk Atlas is one of the few that treats *agents* - software that pursues goals and acts on your environment through tools - as a distinct source of risk worth naming in its own right. For security leaders watching AI agents and MCP servers proliferate across the fleet, that distinction matters. It is the difference between a taxonomy that folds agent behavior into generic 'LLM risk' and one that names function-calling hallucinations, misaligned actions, and untraceable tool use as the specific failure modes they are.
This guide explains what the IBM AI Risk Atlas is, how its taxonomy is actually structured, how each part maps onto the agentic attack surface, and how a security or GRC team operationalizes it. We stick to what the published sources support - IBM's live documentation, the 2025 arXiv paper, and the open-source tooling - and flag where the framework is genuinely new and still settling.
What the IBM AI Risk Atlas is
The IBM AI Risk Atlas is a taxonomy of AI risks published by IBM, rooted in IBM Research and the company's AI Ethics Board work (notably the report *Foundation models: Opportunities, risks and mitigations*). It catalogs the risks posed by traditional machine learning, generative AI / foundation models, and AI agents; explains the potential consequence of each risk; and groups them so practitioners can focus on what is relevant to a given use case.
Two things make it more than a list. First, it is operationalized inside IBM watsonx.governance - it is not just documentation; the risks are integrated into the Governance Console out of the box and drive tooling. Second, it has been formalized academically: the 2025 paper *AI Risk Atlas: Taxonomy and Tooling for Navigating AI Risks and Resources* (arXiv:2503.05780) describes the taxonomy and the companion open-source project, Risk Atlas Nexus (github.com/IBM/risk-atlas-nexus), which maps Atlas risks to mitigations and to other frameworks.
A note on scope and versioning: the Atlas is continuously updated in IBM's live documentation, so exact risk counts drift over time, and different crosswalks tally them slightly differently. Treat any precise total as approximate. What is stable is the structure.
The taxonomy: categories and dimensions
The Atlas uses a two-level structure. The top level groups risks by origin - where in the AI lifecycle the risk arises. Within each category, individual named risks are clustered by dimension - a cross-cutting concern that recurs across the lifecycle. Risks that are specific to, or amplified by, agentic AI are flagged across this structure and documented as their own set.
Top-level categories (by origin)
- Input (training data) risks - risks originating in the data used to build or fine-tune a model: uncertain or improper data provenance, data poisoning, unrepresentative or biased training data, confidential or personal information in training data, and consent / usage-rights issues.
- Inference risks - risks that manifest while the model is being queried: prompt injection and indirect (instruction) attacks, extraction and membership-inference attacks, model evasion, exposure of personal or confidential information at inference time, and robustness failures under adversarial input.
- Output risks - risks in what the model produces: hallucination and inaccuracy, toxic or harmful output, biased output, unreliable or missing source attribution, IP and confidential-data leakage in outputs, and lack of explainability.
- Non-Technical risks - organizational, legal, ethical, and governance risks not tied to a specific model mechanism: governance gaps, regulatory and accountability challenges, lack of transparency, impact on jobs, societal and environmental impact, and misuse.
IBM's documentation organizes the Atlas around these four origin categories. On top of that, it explicitly marks which risks are specific to agentic AI and which are more severe or more likely because of agentic AI - and gives each agentic risk its own documentation page. That agentic set is precisely the part most relevant to anyone governing a fleet of AI agents and MCP servers, so we treat it separately below.
Cross-cutting dimensions
Within categories, risks are grouped by dimension so a team can zero in on what it cares about. Recurring dimensions include accuracy, fairness, explainability / transparency, robustness, privacy, value alignment, misuse, and societal / governance impact. Each individual risk also has its own dedicated IBM Documentation page describing the risk, why it matters, and related concepts - for example pages for *Over- or under-reliance on AI agents*, *Function calling hallucination*, and *Unauthorized use*.
The agentic risks in detail
The agentic risks are the Atlas's distinguishing strength. They treat an agent - software that pursues goals and acts in its environment via tools - as its own risk surface, with each item named and documented individually. Representative entries include:
- Unexplainable and untraceable actions - you cannot reconstruct what the agent did or why.
- Misaligned actions - the agent acts contrary to intended goals or constraints.
- Over- or under-reliance on AI agents - humans trust the agent too much, or too little, in ways that create risk.
- Function calling hallucination - the agent generates wrong functions or wrong parameters, producing incorrect, unnecessary, or harmful tool calls.
- Attack on AI agents' external resources - attackers exploit vulnerabilities in the tools, databases, services, or other agents an agent relies on to act.
- Unauthorized use and exploit trust mismatch - the agent is used, or acts, beyond what it is authorized to do, or trust boundaries between components are abused.
- Sharing IP / PI / confidential information with tools or users - sensitive data leaks outward through tool calls or responses.
- Redundant actions, plus challenges around insufficient agent evaluation, mitigation and maintenance, and a lack of transparency, reproducibility, and accountability.
The Atlas also makes an architectural observation that resonates with anyone who has watched agents in production: increasing an agent's autonomy to select and consult tools increases behavioral variability. More autonomy, more tools, more non-determinism - and a wider surface for the risks above to land.
Mapping the Atlas onto AI agents and MCP servers
The value of a taxonomy for a security team is that it gives shared names to things you have to defend against. Here is how the Atlas's items map to the concrete agentic attack surface - and, crucially, to *what you would need to observe* to know whether each risk is materializing.
| Atlas item | Agentic / MCP manifestation | What a team must monitor |
|---|---|---|
| Unauthorized use; unexplainable & untraceable actions | Shadow agents and MCP servers running unseen across endpoints and the fleet | Inventory of every agent and MCP server, and a record of what each did |
| Misaligned / redundant actions; autonomy increases variability | Over-privileged agents; ungoverned MCP tool invocation | Per-agent tool permissions and which tools are actually being called |
| Indirect instructions attack (Inference) | A poisoned web page or tool result hijacks an agent's tool calls | Anomalous tool-call sequences and behavior changes after external inputs |
| Attack on external resources; function calling hallucination; exploit trust mismatch | Malicious or vulnerable MCP servers; agents calling tools they shouldn't | Which agents talk to which MCP servers, and with what scopes |
| Sharing IP/PI/confidential info with tools/users; output and inference data leakage | Sensitive data flowing out through tool calls and MCP connections | Data crossing agent/MCP boundaries |
| Unauthorized use; exploit trust mismatch (identity) | Non-human identity: on whose behalf is the agent acting when it calls a tool? | Agent-to-identity binding for each action |
| Reproducibility & accountability challenges | No defensible record of agent decisions and actions | An immutable, per-action audit trail |
| Insufficient agent evaluation | Agents shipped without continuous, fleet-wide assessment | Ongoing runtime coverage of the live agent population |
Several of these map cleanly to attack patterns we have documented elsewhere. The Inference-category indirect instructions attacks are the dominant agentic compromise path - see our breakdown of multi-agent prompt injection and credential theft. The Atlas's *attack on external resources* and *exploit trust mismatch* are exactly what plays out in an MCP tool-poisoning campaign, and they reinforce why MCP server security deserves its own program rather than being treated as a footnote to model security.
How a security team operationalizes the Atlas
The Atlas is most useful as connective tissue between a high-level governance program and concrete engineering controls. A practical sequence:
- Adopt the vocabulary. Use Atlas risk names in your risk register and AI use-case reviews so security, GRC, and platform teams describe the same risk the same way. Naming *function calling hallucination* beats arguing about whether 'the agent did something weird' counts as a finding.
- Filter to what applies. The category-and-dimension structure exists so you can scope. For an autonomous coding agent, the agentic risks plus Inference (prompt injection) and Output (data leakage) are where you spend attention; you can de-prioritize dimensions that do not apply.
- Crosswalk to your other frameworks. Use Risk Atlas Nexus to map Atlas risks to the NIST AI RMF, the OWASP Top 10s (for LLM and for Agentic Applications), and the EU AI Act so you are not maintaining parallel, disconnected risk lists.
- Wire it into governance tooling. Inside
watsonx.governance, the Atlas is built into the Governance Console, drives out-of-box risk-identification assessments (AI model onboarding, and use case + model combined), and feeds the Model Risk Evaluation Engine, which computes metrics tied to Atlas risk dimensions. - Close the runtime gap. Recognize that an assessment is only as good as its inputs. An Atlas-aligned evaluation of an agent assumes you know the agent exists and can observe what it does - which brings us to the gap the Atlas itself names but does not fill.
Where the Atlas stops - and where visibility begins
The Atlas is deliberately a taxonomy, not a sensor. It asserts that *unexplainable and untraceable actions*, *unauthorized use*, and *accountability challenges* are real risks of agentic systems. It does not - and is not designed to - tell you which agents and MCP servers are actually running on your endpoints, what permissions they hold, or what they did at 3 a.m. last Tuesday. That is the difference between a risk being *named* and a risk being *measured against your real environment*.
This is the gap continuous agent and MCP visibility fills, and it is the category Anomity works in. To score anything against the agentic risks, you first have to discover the agents - a step that is harder than it sounds, because so many of them arrive as developer tooling and personal assistants rather than sanctioned deployments. This is the core argument for why AI agents and MCP servers are the new shadow IT, and it is consistent with what we find when we scan AI agent configs: credentials, over-broad permissions, and unvetted MCP servers wired into agents nobody on the security team knew existed.
The honest framing is this: the Atlas gives you the *questions* - does this agent take untraceable actions, can it exploit a trust mismatch, does it share confidential data with tools - and continuous visibility into the agent and MCP layer gives you the *evidence* to answer them per agent, per action. The two are complementary. A governance program that adopts the Atlas without runtime discovery is reasoning about a population it cannot see; runtime discovery without a taxonomy produces alerts no one can categorize or report on.
Bottom line
The IBM AI Risk Atlas earns a place in any serious AI governance program for one reason above the rest: it treats agentic risk as a set of named, specific failure modes, rather than diluting it into general LLM concerns. Use it to standardize how your organization names and reasons about AI risk, to crosswalk to NIST, OWASP, and the EU AI Act, and to feed governance tooling. Then pair it with continuous discovery and monitoring of the agent and MCP layer - because, as the Atlas itself effectively concedes when it names untraceable and unauthorized agent actions, you cannot govern what you cannot see.
For teams whose primary exposure is developer-side automation, the same logic applies one layer down - see our guides on securing AI coding agents and CLIs and governing AI coding assistants across your fleet, where the Atlas's agentic risks show up most acutely first.
Frequently asked questions
What is the IBM AI Risk Atlas?
It is a taxonomy of AI risks published by IBM, rooted in IBM Research and IBM's AI Ethics Board work. It catalogs named risks across traditional machine learning, generative AI, and AI agents, explains the potential consequence of each, and groups them by origin and by risk dimension so practitioners can focus on what is relevant to their use case.
How is the IBM AI Risk Atlas organized?
On two levels. The top level groups risks by where they originate: Input (training data), Inference, Output, and Non-Technical. Within each category, risks are clustered by cross-cutting dimensions such as accuracy, fairness, explainability, robustness, privacy, value alignment, and misuse. Risks that are specific to or amplified by agentic AI are flagged and documented as their own set.
Does the IBM AI Risk Atlas cover AI agents?
Yes. The Atlas was extended with risks that are specific to, or amplified by, agentic AI - including misaligned actions, function calling hallucination, attack on AI agents' external resources, unauthorized use, over- or under-reliance on AI agents, and unexplainable and untraceable actions.
How many risks are in the IBM AI Risk Atlas?
The Atlas enumerates dozens of individually named risks across its categories. Because IBM updates the live documentation over time and different crosswalks count slightly differently, treat any precise total as approximate rather than fixed. What is stable is the structure: four origin categories, cross-cutting dimensions, and a documented set of agentic risks.
How does the IBM AI Risk Atlas relate to NIST AI RMF, OWASP, and the EU AI Act?
The open-source Risk Atlas Nexus project builds an ontology that maps Atlas risks to mitigations and to external frameworks - including the NIST AI RMF (Generative AI Profile), the OWASP Top 10 for LLM Applications, the OWASP Top 10 for Agentic Applications, and EU AI Act questionnaires. It uses LinkML and SSSOM mappings to link the taxonomies.
How is the IBM AI Risk Atlas used in practice?
It is wired into IBM watsonx.governance out of the box. The risks are integrated into the Governance Console and drive the out-of-box risk-identification assessments (AI model onboarding, and use case + model combined), and they underpin the Model Risk Evaluation Engine, which computes metrics tied to Atlas risk dimensions.
Is the IBM AI Risk Atlas a compliance standard?
No. It is a risk taxonomy and reference, not a certifiable standard or regulation. It helps teams name and reason about risks consistently and feeds governance tooling, but it does not impose mandatory controls the way a regulation or audited standard does.
Does the IBM AI Risk Atlas discover agents running in my environment?
No. The Atlas defines and categorizes risks, including the risk of unauthorized and untraceable agent actions. It does not inventory the agents and MCP servers actually running on your endpoints or fleet. That discovery and runtime monitoring is a separate visibility problem.




