← Back to blog Guide

OWASP AIVSS: Scoring Agentic AI Vulnerabilities Beyond CVSS (2026)

Anomity Research Anomity Research · Jun 21, 2026 · 11 min read

TL;DR

OWASP AIVSS (AI Vulnerability Scoring System) is an open OWASP project that scores agentic AI vulnerabilities by taking a standard CVSS v4.0 base score and adding an agentic *uplift* - because a small technical flaw becomes far more dangerous when an autonomous agent can act on it.
The v0.8 formula is AIVSS = (CVSS_Base + AARS) × Mitigation_Factor, where the Agentic AI Risk Score (AARS) is the uplift (10 − CVSS_Base) × (Factor_Sum / 10) × ThM. The result stays on a 0–10 scale.
It ships in two parts: a taxonomy of 10 OWASP Agentic AI Core Security Risks and a scoring model driven by 10 Agentic Risk Amplification Factors (autonomy, tool surface, identity, multi-agent interaction, persistence, and more), each scored 0.0, 0.5, or 1.0.
AIVSS is still pre-1.0. The current scoring release is v0.8 (published March 19, 2026); v0.5 was the initial release. It is grounded in the NIST AI RMF and informed by the CSA MAESTRO threat model. Treat it as an emerging, evolving standard.
The amplification factors map almost exactly to what a security team must discover per agent and MCP server - making continuous inventory and behavior monitoring the data source that makes AIVSS scoring possible at scale.

For two decades, CVSS has been the lingua franca of vulnerability severity. It works because it measures a stable thing: how bad is this flaw, and how hard is it to exploit. Agentic AI breaks that assumption. The same zero-click injection that scores a polite CVSS 6.5 in a static web app becomes catastrophic when the thing reading the malicious input is an autonomous agent holding live credentials, a dozen connected tools, and standing permission to act without a human in the loop. The flaw didn't change. The blast radius did.

OWASP AIVSS - the AI Vulnerability Scoring System - exists to close exactly that gap. It is an open OWASP project that produces a standardized, quantifiable score for security risk in AI systems, beginning with agentic AI. Rather than replace CVSS, it extends it: a standard CVSS v4.0 base score is augmented with a new Agentic AI Risk Score (AARS) - an *uplift* that captures how much an agent's design amplifies any given flaw. This guide explains what AIVSS actually is, walks its real taxonomy and math, and shows how a security team can put it to work on a fleet of agents and MCP servers.

What AIVSS is (and isn't)

AIVSS is published at aivss.owasp.org as an open community project. It is important to set expectations on maturity: AIVSS is pre-1.0 and actively evolving. The current scoring release is v0.8, published March 19, 2026 ahead of RSAC 2026; v0.5 was the earlier initial release. Both are draft documents, and v0.8 carries the fuller, current methodology. Treat AIVSS as an emerging standard you can pilot and influence - not a frozen control you bolt on and forget.

The framework is aligned with the NIST AI Risk Management Framework - its Govern, Map, Measure, and Manage functions - and informed by the Cloud Security Alliance's MAESTRO threat-modeling work. A separate crosswalk maps AIVSS to the AIUC-1 control set, letting teams move from a score to a corresponding remediation control. So while it is young, it is not isolated - it sits inside the broader AI governance and risk framework ecosystem that security teams are already tracking.

AIVSS ships as two distinct deliverables that are easy to conflate but serve different jobs:

Part 1 - a taxonomy: the 10 OWASP Agentic AI Core Security Risks, a vocabulary for the kinds of things that go wrong with agents.
Part 2 - a scoring system: the AIVSS-Agentic model, a math procedure that turns a specific finding into a single 0–10 number.

The Amplification Principle

The conceptual heart of AIVSS is what the project calls the Amplification Principle: an agent's autonomy and goal-directed behavior can take a minor technical flaw and magnify its real-world impact dramatically. A read-only information leak in a passive system is one thing. The same leak feeding an agent that can chain tool calls, pivot across SaaS integrations, and pursue a hijacked goal is another thing entirely. AIVSS encodes this by refusing to score the flaw alone - it always adds an uplift driven by the context the flaw lives in.

The 10 Agentic AI Core Security Risks

The taxonomy gives security teams a shared name for each failure mode. These are the risk categories you classify a finding *into*; the amplification factors (below) are separate. The names and descriptions here track the v0.8 document's own framing.

#	Risk	What it covers
1	Agentic AI Tool Misuse	An attacker manipulates an agent into abusing its connected tools/APIs (including MCP servers) to take unauthorized actions.
2	Agent Access Control Violation	Agents exceed or escape intended permission boundaries - confused-deputy patterns, SaaS-to-SaaS pivoting via pre-authorized integrations.
3	Agent Cascading Failures	A fault or compromise in one agent propagates through chains and workflows, amplifying impact across the system.
4	Agent Orchestration and Multi-Agent Exploitation	Attacks on multi-agent control flow, orchestration, and inter-agent trust - control-flow hijacking, rogue autonomy.
5	Agent Identity Impersonation	An attacker forges or assumes an agent's identity, or an agent shifts roles/permissions to act as another entity.
6	Agent Memory and Context Manipulation	Poisoning, drift, cross-user contamination, or residual-memory exploitation of stored context to alter future behavior.
7	Insecure Agent Critical Systems Interaction	Unsafe agent interaction with high-impact or critical systems and infrastructure.
8	Agent Supply Chain and Dependency Risk	Compromise via the agent's dependencies, tools, models, or third-party/MCP components - for example, tool poisoning.
9	Agent Untraceability	No reliable audit trail or observability; agents act asynchronously and autonomously with no attribution.
10	Agent Goal and Instruction Manipulation	Goal hijacking and prompt-injection-style manipulation of objectives, amplified by autonomy.

The scoring model and the AARS

Once a finding is classified, AIVSS scores it. The key shift from CVSS is that AIVSS does not average the technical flaw with the agentic context - it treats the AARS as an *uplift* that fills the gap between the CVSS base score and the maximum of 10, scaled by how agentic the system is. The final score is computed in a short procedure:

Risk_Gap   = 10 - CVSS_Base
AARS       = Risk_Gap * (Factor_Sum / 10) * ThM
AIVSS_raw  = (CVSS_Base + AARS) * Mitigation_Factor
AIVSS      = RoundHalfUp(AIVSS_raw, 1)

The inputs that drive it:

CVSS_Base - a standard CVSS v4.0 base score (0–10) for the underlying technical flaw. Nothing new here; you score the defect exactly as you always have.
Factor_Sum - the 0.0–10.0 sum of the 10 Agentic Risk Amplification Factors. This is the new dimension that quantifies how agentic the system is.
ThM - a Threat Multiplier tied to exploit maturity. It is not fixed at 1.0; the worked examples in the v0.8 document use 0.97, and the value is an initial reference open to community review.
Mitigation_Factor - a scaling factor for compensating controls that defaults to 1.0 (no or weak mitigation) unless a lower value is justified.

The design statement is the *uplift* mechanic: a flaw with a low CVSS base attached to a highly agentic system gets pushed up toward 10, while the same flaw in a non-agentic system barely moves. The output stays on a familiar 0–10 scale with the usual CVSS severity bands (Critical 9.0–10.0, High 7.0–8.9, Medium 4.0–6.9, Low 0.1–3.9), so it slots into existing triage thresholds. AIVSS reports a single number; CVSS_Base, Factor_Sum, ThM, Mitigation_Factor, and AARS are recorded as supporting evidence, not as a separate formatted vector. That distinction matters: a final 7.1 built from a CVSS 2.1 plus a large agentic uplift is a very different problem from a 7.1 built from a CVSS 6.8 with a small uplift, and the recorded inputs keep that visible.

The 10 amplification factors

The Factor_Sum is built from 10 Agentic Risk Amplification Factors, each describing a property of how the agent is designed and deployed. Each is scored on a simple three-point scale: 0.0 (None / not present), 0.5 (Partial / limited), or 1.0 (Full / unconstrained). Summed, they yield the 0.0–10.0 Factor_Sum. The order below is the order the v0.8 document uses for the calculation.

Execution Autonomy - ability to commit actions without human verification.
External Tool Control Surface - breadth and privilege of external APIs/tools the agent can access (including MCP).
Natural Language Interface - reliance on unstructured natural language to drive control logic and execution.
Contextual Awareness - use of environmental sensors or broad data context to drive decisions.
Behavioral Non-Determinism - variance in output or action for identical inputs.
Opacity & Reflexivity - the lack of internal visibility or ability to audit decision logic.
Persistent State Retention - ability to retain memory or state across sessions.
Dynamic Identity - ability to assume different roles or permissions at runtime.
Multi-Agent Interactions - coordination with or dependency on other autonomous agents.
Self-Modification - ability to alter its own code, prompts, or tool configurations.

Notice the symmetry: there are 10 risks *and* 10 factors, but they are not the same list. The risks name what went wrong; the factors quantify how much the agent's design multiplies the harm. An agent that is fully autonomous, has a broad tool surface, persists memory, and coordinates with other agents will carry a high Factor_Sum regardless of which specific flaw you found - because its architecture is inherently high-leverage.

How this applies to AI agents and MCP servers

AIVSS is the closest scoring standard to the problem security teams actually face on the ground, because its factors *are* the attributes of a real agent or MCP server. Several mappings are worth making explicit.

MCP servers: tool surface plus two risk categories

The External Tool Control Surface factor, Risk 1 (Agentic AI Tool Misuse), and Risk 8 (Agent Supply Chain and Dependency Risk) all converge on MCP. An MCP server is, in AIVSS terms, a bundle of tool capability with a provenance and a permission scope. An unvetted MCP server pulled from a public registry is precisely the supply-chain and tool-misuse surface AIVSS is describing - and it is the same blind spot we cover in the MCP server security guide and saw exploited in the MCP tool poisoning campaign. You cannot score Risk 1 or Risk 8 for an MCP server you don't know exists.

Permissions and identity

Risk 2 (Agent Access Control Violation) and Risk 5 (Agent Identity Impersonation) pair naturally with the Dynamic Identity factor. Confused-deputy abuse, SaaS-to-SaaS pivoting through pre-authorized integrations, and non-human identity sprawl are all questions of *what permissions an agent holds and how fluidly it can change them*. The factor scoring rewards you for knowing this: an agent that can dynamically assume roles or cross tenant boundaries scores higher, correctly, than one pinned to a single least-privilege service account.

Runtime manipulation

Risk 10 (Agent Goal and Instruction Manipulation) and Risk 6 (Agent Memory and Context Manipulation) cover prompt injection and memory poisoning - the same class of attack documented in our writeup of multi-agent prompt injection and credential theft. These are runtime conditions; you surface them by watching agent behavior for anomalies, not by reading a static config once.

Traceability is a first-class risk

Risk 9 (Agent Untraceability) deserves special attention because frameworks rarely name the *absence* of observability as a vulnerability in its own right. AIVSS does - and it reinforces it with the Opacity & Reflexivity factor, which scores the inability to trace why an action was taken. If an agent acts asynchronously and autonomously with no reliable attribution or logging, that lack of an audit trail is itself a scorable risk - which is the same premise behind "you can't govern what you can't see."

Operationalizing AIVSS in a security program

AIVSS is only as good as the data you feed it. Turning the framework into a repeatable workflow looks like this:

Inventory the fleet. Enumerate every AI agent and MCP server in your environment. This is the prerequisite - an unknown agent has no score and no owner.
Profile the amplification factors per asset. For each agent, record its 10-factor profile to produce the Factor_Sum: is it autonomous, how broad is its tool surface, does it persist memory, can it shift identity, does it talk to other agents. This characterizes the agentic uplift even before any specific CVE is known.
Attach CVSS base scores. When a technical flaw is identified (in a model, dependency, gateway, or tool), pull or estimate its CVSS v4.0 base score.
Compute and prioritize. Run the procedure - Risk_Gap, AARS uplift, then the final AIVSS score - and triage on the result. Use the recorded inputs to decide *how* to fix: a low-CVSS finding pushed high by a large agentic uplift often calls for constraining the agent (reduce autonomy, scope the tool surface) rather than patching code.
Report and audit. AIVSS ships an AIVSS-Agentic Report JSON schema and references an SSVC decision-tree alternative for stakeholders who prefer qualitative triage - feed these into GRC and audit workflows so the score is board-reportable and reproducible.

The hard part is steps 1 and 2. A CVSS base score is a known quantity your team already produces. The Factor_Sum, by contrast, requires accurate, *current* knowledge of how every agent in your environment is configured - and agents change. Coding assistants gain new tools weekly; MCP servers are added by developers without review, as we detail in securing AI coding agents and CLIs. A factor profile captured manually in a spreadsheet is stale within a sprint.

Where continuous agent and MCP visibility fits

This is the practical bridge between AIVSS as a methodology and AIVSS as something you actually run. The 10 amplification factors are, almost line for line, the attributes a discovery system has to inventory per agent and MCP server: autonomy, tool surface, persistence, identity, multi-agent links. The behavioral risks - Goal and Instruction Manipulation, Memory and Context Manipulation - are what runtime anomaly detection surfaces. And Risk 9, untraceability, is solved precisely by maintaining the audit trail AIVSS treats as a first-class concern.

Anomity sits in this layer. By continuously discovering every agent and MCP server on the fleet (see Inside Anomity Discovery), monitoring their permissions and behavior, and recording an attributable audit trail, the raw inputs to an AIVSS score become a byproduct of normal operations rather than a manual audit exercise. In principle the Factor_Sum for each discovered asset can be populated from inventory metadata - turning a list of agents into a prioritized, reportable risk picture. We make this point not as a pitch but as an observation about data flow: AIVSS formalizes the exact dimensions a visibility platform already has to measure.

The takeaway

AIVSS is young, draft, and worth adopting early. It does one thing CVSS cannot: it makes the *agentic context* a measurable, first-class part of severity, so a quiet flaw attached to a high-autonomy agent stops hiding behind a mid-range CVSS number. Read the v0.8 methodology, pilot the scoring procedure on a handful of real agents, and pay attention to whether you can even answer the factor questions for your fleet. If you can't, that gap - not the formula - is your first finding. As you scale, the constraint won't be the math; it will be keeping an accurate, live picture of every agent and MCP server you need to score, which is the same governance problem behind governing AI coding assistants across your fleet.

Frequently asked questions

What is OWASP AIVSS?

AIVSS is the OWASP AI Vulnerability Scoring System, an open OWASP project that creates a standardized, quantifiable way to score security risk in AI systems, starting with agentic AI. It pairs a taxonomy of 10 Agentic AI Core Security Risks with a scoring model that extends a CVSS v4.0 base score using an agentic uplift, the Agentic AI Risk Score (AARS).

How is AIVSS different from CVSS?

CVSS scores the technical severity of a flaw in isolation. AIVSS keeps a CVSS v4.0 base score but adds an agentic uplift (AARS) that captures how an agent's autonomy, tool surface, identity, and other capabilities amplify that flaw. The uplift fills the gap between the CVSS base and a maximum of 10, scaled by how agentic the system is.

What is the AIVSS v0.8 formula?

AIVSS = (CVSS_Base + AARS) × Mitigation_Factor. The uplift is AARS = (10 − CVSS_Base) × (Factor_Sum / 10) × ThM, where Factor_Sum is the 0–10 sum of the 10 amplification factors, ThM is a Threat Multiplier tied to exploit maturity (examples use 0.97), and Mitigation_Factor defaults to 1.0. The final score is rounded to one decimal on a 0–10 scale.

What are the 10 OWASP Agentic AI Core Security Risks?

Agentic AI Tool Misuse, Agent Access Control Violation, Agent Cascading Failures, Agent Orchestration and Multi-Agent Exploitation, Agent Identity Impersonation, Agent Memory and Context Manipulation, Insecure Agent Critical Systems Interaction, Agent Supply Chain and Dependency Risk, Agent Untraceability, and Agent Goal and Instruction Manipulation.

What are the AIVSS amplification factors?

Ten properties of agent design, each scored 0.0 (none), 0.5 (partial), or 1.0 (full): Execution Autonomy, External Tool Control Surface, Natural Language Interface, Contextual Awareness, Behavioral Non-Determinism, Opacity and Reflexivity, Persistent State Retention, Dynamic Identity, Multi-Agent Interactions, and Self-Modification. Their sum is the Factor_Sum (0–10) used in the AARS uplift.

Is AIVSS a finished, official standard?

Not yet. AIVSS is pre-1.0 and actively evolving. The current scoring release is v0.8 (March 19, 2026); v0.5 was the initial release. It is an open OWASP community project, aligned with the NIST AI Risk Management Framework and informed by the Cloud Security Alliance MAESTRO threat model, so expect refinement before a 1.0.

Does AIVSS cover MCP servers?

Yes, directly. The External Tool Control Surface factor and the Tool Misuse and Supply Chain risk categories map to MCP server behavior - the tools an MCP server exposes, its scope, and its provenance. Scoring those risks requires knowing which MCP servers exist and what they can do.

How do you operationalize AIVSS in a security program?

Inventory every agent and MCP server, capture each one's amplification-factor profile (the Factor_Sum), pull or estimate a CVSS base score for known flaws, compute the AARS uplift and the final AIVSS score, then prioritize remediation by that score while reading the inputs to see whether risk is technical, agentic, or both. Feed the output into GRC and audit workflows.