← Back to blog Guide

Runtime Monitoring and Anomaly Detection for AI Agents: Building Behavioral Baselines

Anomity Research Anomity Research · Jun 21, 2026 · 11 min read

TL;DR

AI agents run a plan-act-iterate loop that touches tools, data, and identities your SIEM and EDR were never built to correlate, so security has to shift from pre-deployment posture to continuous runtime monitoring.
Collect four signal classes per agent: tool calls, data access, the full identity chain, and outputs/egress, plus cognitive events like plan.start and memory.read that conventional telemetry never sees.
Baseline behavior per agent class (a coding agent is not a support agent), using SPC methods: Shewhart charts for large shifts and EWMA/CUSUM for the slow drift that single-event scoring misses.
Correlate action chains (tool call to data access to egress) instead of scoring isolated events; the sequence is what reveals an intent shift, including the lethal trifecta.
Runtime visibility is the prerequisite for enforcement: continuous authorization re-evaluation, graduated containment, and auto-revocation on task completion all need a live behavioral signal to act on.

An AI agent does not log in once and sit idle. It receives an objective, plans, calls external tools, reads and writes data, evaluates the result, and iterates, often dozens of times before it finishes a single task. Each step touches a system, a data source, or an API endpoint that your SIEM has no record of and your EDR never sees. By the time the work is done, the agent has executed a chain of credentialed actions that no human-centric monitoring tool was designed to follow, let alone correlate.

That gap is the subject of this guide. Most AI security effort still concentrates on posture: static permission reviews, training-time alignment, pre-deployment red-teaming. Those controls matter, but they share a blind spot. The behaviors that get organizations breached, recursive planning loops, goal drift, cascading tool chains, are emergent: they arise dynamically at runtime and elude pre-deployment control methods. You cannot review your way out of a risk that only exists while the agent is running. You have to watch it run.

This is a practitioner's guide to doing exactly that: what signals to collect, how to turn them into per-agent-class behavioral baselines using established statistical methods, how to detect the slow drift that single-event alerts miss, and how that live signal feeds enforcement. It is deliberately concrete. If you operate, govern, or secure a fleet of agents, you should be able to act on it.

What runtime monitoring actually means for agents

Runtime monitoring is the continuous collection and analysis of an agent's behavior while it executes. The unit of observation is not a login event or a process spawn; it is the agent's loop and the actions that loop produces. The premise is simple: an agent's intent is not visible in its configuration, only in its behavior, so the behavior is what you instrument.

This is a shift in altitude. Traditional detection asks "is this single action allowed?" Runtime monitoring asks "is this sequence of actions consistent with what this agent normally does and what it was asked to do?" The first question is answerable by a policy engine. The second requires a baseline, a memory of normal, and the ability to correlate events across the whole loop. That distinction drives everything that follows.

Why SIEM, EDR, and IAM miss it

The tools you already own were built for a different actor. SIEM correlates logs, EDR watches endpoints, IAM evaluates human-style logins. None was designed to follow chains of credentialed actions an agent takes across identity providers, vaults, and applications in a single workflow. The result is a structural blind spot rather than a tuning problem.

No system of record. There is typically no authoritative inventory of what agents exist, who owns them, or what they are doing. Detection that has no denominator cannot reason about anomalies.
Wrong correlation model. Human behavioral analytics looks for an anomalous login or an unusual endpoint process. An agent's risk lives in the relationship between its tool calls, its data access, and its identity, which human-centric tools do not join.
Identity is a chain, not a login. An agent action carries an invoking human identity, an agent identity, a tool authorization, and a set of resource entitlements at once. Evaluating only the login misses where the privilege actually flows.
Scale and invisibility. Machine and agent identities now vastly outnumber humans, and many operate without anyone in security knowing they exist, the second half of Anomity's name, anonymity, made literal.

This is the same shadow-IT dynamic we cover in AI agents are the new shadow IT: capability arrives in the organization faster than the controls that should govern it, and the monitoring stack is the part that lags hardest.

The four signal classes to collect

Effective monitoring starts with the right telemetry. There are four observable signal classes, plus a fifth that conventional infrastructure never produces and that turns out to be decisive.

Signal class	What to capture	Example anomaly
Tool calls	Which tool, arguments, frequency, novel tool combinations	An agent that has never used `shell` suddenly invokes it
Data access	Source, record count, sensitivity, egress volume	Read volume from a customer table jumps 20x in one session
Identity chain	Invoking human, agent identity, tool auth, resource entitlements	Agent action runs under a more privileged role than its task needs
Outputs / egress	Responses, exfil-shaped content, external destinations	Base64-encoded blob posted to an unfamiliar external endpoint
Cognitive events	`plan.start`, `goal.set`, `memory.read`, `memory.write`	Stated goal mutates mid-session away from the assigned task

That last row is the differentiator. Cognitive or reasoning telemetry, internal plan and state changes, is invisible to network logs and endpoint agents but is exactly where goal drift and memory poisoning first show up. The MI9 runtime governance framework formalizes this as agent-semantic telemetry (ATS) that records these events alongside the actions they produce.

Standardize on OpenTelemetry GenAI conventions

You do not have to invent the schema. The OpenTelemetry GenAI semantic conventions define spans such as invoke_agent, chat, and execute_tool, with attributes like gen_ai.input.messages and gen_ai.output.messages plus token and latency metrics. Frameworks including LangChain, CrewAI, and AutoGen can emit these natively. Adopting the standard early means your baselines are portable across agent frameworks instead of locked to one vendor's logs.

Building behavioral baselines per agent class

A baseline is a statistical profile of normal. The single most common mistake is building one global baseline for all agents. A coding agent that clones repos, runs builds, and calls package registries has a completely different normal profile from a customer-support agent that reads a knowledge base and writes ticket updates. Baseline per agent class or workflow, never across the whole fleet, or you will simultaneously drown in false positives on the busy agents and miss real anomalies on the quiet ones.

For each class, profile the distributions that matter: daily tool-call counts, the set of tools normally used together, data-access volume by sensitivity tier, and typical egress destinations. Those distributions become the reference the live signal is measured against.

SPC: the statistical core

Statistical Process Control gives you mature, well-understood machinery for exactly this problem, detecting when a process has shifted away from its established mean. Three methods matter, and they are complementary rather than interchangeable:

Shewhart charts flag large, sudden shifts (around three sigma) efficiently. Good for a tool-call rate that spikes 10x in an hour; poor at catching slow creep.
EWMA (Exponentially Weighted Moving Average) weights recent observations more heavily and is highly sensitive to small, gradual shifts of one to two sigma. This is the method for catching a behavior that drifts a little each day until it is somewhere it should never be.
CUSUM (Cumulative Sum) accumulates small deviations over time and, like EWMA, excels at detecting persistent small shifts that a Shewhart chart would never trip. It is the stronger of the two for rapidly catching larger sustained shifts.

The practical lesson: a single-event threshold catches the spike but misses the creep. An agent whose daily data-access volume grows two percent a day will never trip a hard limit on any single day, yet will have doubled its footprint inside two months. EWMA and CUSUM are designed to catch precisely that trajectory. Tune the weighting (the lambda parameter in EWMA) to match how fast you need to react versus how much noise you can tolerate.

Because no single chart catches both regimes, established SPC practice combines a Shewhart scheme with EWMA or CUSUM so the system flags large shifts and small sustained drift at once. That matters here because agent behavior is not stationary: the baseline itself has to move with legitimate change without erasing the memory that makes drift visible.

Detection patterns that beat single-event scoring

Baselines tell you when one metric drifts. The harder, more valuable detection comes from correlating sequences. Three patterns separate effective systems from noisy ones.

Action-chain correlation

Score the chain, not the click. The sequence tool call to data access to egress can be entirely benign at each individual step and malicious as a whole. Systems that only score isolated events against a baseline miss the sequence that reveals the intent shift. MI9's conformance engine models this with finite-state machines that catch multi-step violations, for example a path of market research, then client outreach, then trade execution, where each action is individually compliant but the combination breaks policy.

The lethal trifecta as a runtime detector

The cleanest example of why chains matter is Simon Willison's lethal trifecta (June 2025): an agent with access to private data, exposure to untrusted content, and the ability to communicate externally is vulnerable to indirect prompt injection regardless of model alignment or prompt hardening. Willison's only reliable prevention is architectural, remove one leg. But runtime monitoring can detect when a single session activates all three legs at once and fire an alert or trigger containment. That is action-chain correlation reduced to its sharpest form, and it ties directly to indirect prompt injection and the lethal trifecta as a data-exfiltration path.

Goal-conditioned drift detection

The central challenge in agent anomaly detection is telling a compromised agent apart from one that is legitimately adapting. Flag every change and you bury the SOC in noise; flag nothing and you miss the takeover. The pattern that works is a goal-conditioned drift indicator: compare current behavior against a baseline conditioned on the agent's stated objective. Behavior that stays consistent with the goal is adaptation; behavior that diverges from it, acquiring tools or data access the task never required, is the signal worth escalating.

The threat taxonomy your monitoring must cover

To know your detection is complete, map it against a published threat model. The OWASP Top 10 for Agentic Applications (the 2026 edition, published December 2025 by the OWASP Gen AI Security Project as a globally peer-reviewed agentic security framework, with a review board drawn from organizations including NIST, Microsoft, AWS, and the Alan Turing Institute) is the right reference. Every category maps to at least one runtime signal.

OWASP Agentic risk	Runtime signal that surfaces it
ASI01 Agent Goal Hijack	Goal-conditioned drift; cognitive `goal.set` telemetry
ASI02 Tool Misuse	Tool-call baseline anomaly; novel tool combinations
ASI03 Identity & Privilege Abuse	Identity-chain correlation; entitlement vs. task mismatch
ASI06 Memory & Context Poisoning	`memory.read` / `memory.write` telemetry and baselines
ASI10 Rogue Agents	Discovery gaps; agents with no owner or no baseline

Memory poisoning deserves a specific note. The MINJA attack (arXiv:2503.03704, a NeurIPS 2025 poster) poisons an agent's memory bank through query-only interaction, with no elevated privileges, and reported high injection and attack success rates in the paper's evaluation. It is a clean justification for treating persistent memory as an attack surface and instrumenting memory reads and writes as first-class signals rather than implementation detail. We go deeper in AI agent memory poisoning explained.

Identity and MCP: the runtime context

Runtime monitoring does not happen in a vacuum; it sits on top of an identity surface that has exploded. CyberArk's 2025 Identity Security Landscape reports machine identities outnumbering humans by more than 80 to 1 (the report's headline ratio is 82:1), with nearly half of machine identities holding privileged or sensitive access. (Vendor ratios vary, so treat 82:1 as a well-sourced anchor rather than a universal constant.) The takeaway is directional and solid: the entities your monitoring must cover are overwhelmingly non-human, frequently privileged, and often ungoverned. That is the foundation for non-human identity governance.

For MCP specifically, the authorization spec builds on OAuth 2.1 and requires the Authorization Code flow with PKCE (RFC 7636, S256 method) for public clients such as agents. Runtime monitoring should verify that the MCP servers your agents connect to actually implement these controls and flag any that do not, which makes non-compliant or unvetted MCP servers a discrete, alertable category. The full picture is in our MCP server security guide and OAuth for MCP servers explained.

Closing the loop: from monitoring to enforcement

Visibility is the prerequisite for enforcement, not a substitute for it. A live behavioral signal is what makes graduated, reversible response possible. Three patterns turn detection into action:

Continuous Authorization Monitoring re-evaluates an agent's permissions as its goal or behavior drifts, so a privilege that was appropriate at task start can be narrowed mid-flight before it is abused.
Graduated Containment escalates across levels, monitoring augmentation, planning intervention, tool restriction, and execution isolation, rather than abruptly terminating an agent in a way that corrupts its state.
Auto-revocation removes credentials on task completion under a zero-standing-privilege, just-in-time model, so an agent holds power only while it is doing the work that needs it. A global token-revocation kill switch is the hard stop behind all of it.

None of these can act without a behavioral signal to trigger on. That is the through-line of this guide: the baseline is not the deliverable, it is the input to enforcement. For the principles underneath, see least privilege for AI agents and the broader OWASP agentic controls checklist.

A practical rollout sequence

Inventory first. You cannot baseline an agent you do not know exists. Build a system of record for every agent and MCP server, with an owner for each.
Instrument with OpenTelemetry GenAI conventions. Emit invoke_agent, execute_tool, and message spans, plus cognitive events, so the signal is standardized from day one.
Classify agents and baseline per class. Group by workflow, profile each group's normal tool, data, identity, and egress distributions.
Layer SPC. Use Shewhart for sharp spikes and EWMA/CUSUM for slow drift; condition drift detection on each agent's stated goal.
Correlate chains. Add lethal-trifecta and tool-to-data-to-egress detectors on top of per-metric baselines.
Wire detection to enforcement. Connect alerts to continuous authorization, graduated containment, and auto-revocation, then keep an immutable audit trail for compliance.

Where continuous agent and MCP visibility fits

This guide describes a discipline, not a product, and you can assemble most of it from open standards and SPC math. The hard part in practice is the part that comes before any chart: maintaining a live, authoritative inventory of every agent and MCP server, with baselines that stay current as the fleet changes. That is the layer Anomity is built for, discovery and inventory feed per-agent-class baselines, permission and identity-chain monitoring supply the signals, anomaly alerting watches for drift, and the audit trail satisfies SOC 2, GDPR, and PCI evidence requirements. The name says it plainly: anomaly detection over anonymity, behavioral baselining applied to the agents most monitoring stacks cannot see.

If you want the discovery side in detail, inside Anomity discovery covers how the inventory is built, and auditing Claude Code across a fleet walks through the same runtime-visibility argument applied to a specific, fast-spreading agent. The principle behind all of it is the one we keep returning to: you can't govern what you can't see, and for AI agents, seeing means watching them run.

Frequently asked questions

What is AI agent monitoring?

AI agent monitoring is the continuous collection and analysis of an AI agent's runtime behavior: which tools it calls, which data it accesses, which identities and entitlements it uses, and what it sends externally. Unlike pre-deployment testing or static permission review, it observes the agent while it executes, because emergent behaviors like goal drift and cascading tool chains only appear at runtime. The goal is to establish a behavioral baseline and alert when live behavior deviates from it.

Why can't SIEM or EDR monitor AI agents?

SIEM, EDR, and IAM were built to evaluate human-style logins and endpoint processes, not chains of credentialed actions an agent takes across identity providers, vaults, and applications. There is typically no authoritative system of record for what agents exist, who owns them, or what they are doing. An agent's risk lives in the sequence of tool calls and data access correlated with its identity, which these tools were never designed to stitch into a single view.

What is a behavioral baseline for an AI agent?

A behavioral baseline is a statistical profile of what normal looks like for a given agent class: typical tool-call frequency and combinations, normal data-access volume and sensitivity, expected egress destinations, and reasoning patterns. Baselines are built per agent class or workflow rather than globally, because a coding agent and a customer-support agent have legitimately different normal profiles. Deviation from the baseline is the anomaly signal.

What is the difference between SPC, EWMA, and CUSUM for anomaly detection?

All three are statistical process control methods. Shewhart charts detect large, sudden shifts (around three sigma) efficiently but are slow to catch small drifts. EWMA (Exponentially Weighted Moving Average) and CUSUM are far more sensitive to small, gradual shifts of one to two sigma, which makes them ideal for catching slow behavioral drift in an agent before it crosses a hard threshold. Established practice combines a Shewhart scheme with EWMA or CUSUM so the system catches both large shifts and small sustained drift.

What is the lethal trifecta and how does runtime monitoring help?

The lethal trifecta, named by Simon Willison in June 2025, is the combination of three agent capabilities: access to private data, exposure to untrusted content, and the ability to communicate externally. Together they make an agent vulnerable to indirect prompt injection regardless of model alignment. Reliable prevention is architectural (remove one leg), but runtime monitoring can detect when a single session activates all three legs and trigger enforcement, making it a clear example of action-chain correlation.

How do you tell a compromised agent from one that is simply learning?

This is the central hard problem in agent anomaly detection. The pattern that works is a goal-conditioned drift indicator: compare current behavior against a baseline conditioned on the agent's stated objective, rather than flagging any change at all. Legitimate adaptation stays consistent with the goal; concerning drift diverges from it, often by acquiring new tools or data access that the assigned task never required. Naive anomaly detection that flags every change generates too much noise to be usable.

What signals should I log from an AI agent?

Capture tool calls (which tool, arguments, frequency, and novel combinations), data access (source, volume, sensitivity, egress), the full identity chain (invoking human, agent identity, tool authorization, resource entitlements), outputs and external communication, and cognitive events such as plan starts, goal sets, and memory reads. The OpenTelemetry GenAI semantic conventions standardize many of these as spans like invoke_agent, chat, and execute_tool, and frameworks such as LangChain, CrewAI, and AutoGen can emit them natively.

How does runtime monitoring connect to enforcement?

Runtime visibility is the prerequisite for enforcement, not a replacement for it. A live behavioral signal feeds continuous authorization monitoring, which re-evaluates permissions as the agent's goal or behavior drifts; graduated containment, which escalates from monitoring to planning intervention to tool restriction to execution isolation; and auto-revocation, which removes credentials on task completion under a zero-standing-privilege model. A global token revocation kill switch is the hard stop.