← Back to blog Research

Shadow AI Data Exposure: What Employees Are Actually Pasting Into ChatGPT, Claude, and Copilot

Anomity Research Anomity Research · Jun 21, 2026 · 11 min read

TL;DR

Browser telemetry from LayerX found 77% of enterprise AI users paste data into GenAI prompts, with about 14 pastes a day into non-corporate accounts and at least 3 carrying sensitive data.
82% of those pastes go through unmanaged personal accounts that sit outside enterprise SSO and DLP, the single biggest blind spot.
IBM's 2025 Cost of a Data Breach Report found 1 in 5 breaches involved shadow AI, costing roughly $670K more than other breaches and disproportionately exposing PII and IP.
The root cause is behavioral: people experience a prompt as a conversation, not a data transmission, and prompting takes about four seconds while thinking through the consequences takes longer.
Traditional DLP cannot see browser paste events into personal accounts, so visibility has to come before control.
The same exposure compounds at machine speed when autonomous agents and MCP servers move data, which is where the lethal trifecta becomes a fleet-wide governance problem.

Open a new browser tab, paste in a customer list, and ask ChatGPT to clean it up. The whole thing takes about four seconds. Reasoning through where that data just went, who can read it, whether it survives in a training set, and whether it should have left the building at all, takes a lot longer. Most employees spend the four seconds and skip the rest.

That gap, four seconds of action against a longer act of judgment that never happens, is the entire story of shadow AI data exposure. It is not a story about reckless people. It is a story about an interface designed to feel like a conversation rather than a data transmission, and a generation of security tooling that cannot see the conversation at all.

In 2025 and 2026, three independent bodies of research, two built on real telemetry rather than surveys, converged on the same uncomfortable picture: generative AI has quietly become the top channel for corporate data leaving the enterprise, and almost nobody is watching it.

The headline: GenAI is now the top exfiltration channel

LayerX's Enterprise AI and SaaS Data Security Report is the most directly relevant data set here because it is built from browser telemetry, not from asking people what they think they do. When you measure actual paste and upload events in the browser, the numbers are stark.

77% of enterprise AI users paste data into GenAI prompts.
Users average roughly 14 pastes per day into non-corporate accounts, with at least 3 of those containing sensitive data.
About 22% of all pastes carry PII or PCI data.
Roughly 40% of files uploaded to GenAI contain PII or PCI.
GenAI accounts for 32% of all corporate-to-personal data movement, making it the single largest exfiltration channel in the enterprise.

Read that last point again. Not one of the larger channels. The largest. According to reporting from The Hacker News on the LayerX findings (October 2025), copy-paste into AI tools has overtaken the file transfers, personal email, and cloud-sync flows that DLP programs were built around. eSecurity Planet reached the same conclusion in its coverage: copy-paste now exceeds file transfer as the top corporate data exfiltration vector.

The detail that should keep CISOs up at night is not the volume; it is the routing. LayerX found that 82% of those pastes come from unmanaged personal accounts rather than enterprise-sanctioned, SSO-backed AI access. The data is not flowing through the front door you instrumented. It is flowing through a personal-account ChatGPT login in a browser tab, on a corporate laptop, completely outside your identity and DLP perimeter.

As LayerX put it, enterprises have little to no visibility into what data is being shared, creating a massive blind spot. That single sentence is the thesis of this entire piece, and it maps cleanly onto the principle Anomity is built around: you can't govern what you can't see.

Where the data goes: ChatGPT is the gravity well

Shadow AI is not evenly distributed across vendors. LayerX found that roughly 92% of all enterprise AI usage is concentrated in ChatGPT, with Gemini, Claude, and Copilot splitting the long tail. ChatGPT also carries the highest measured rate of sensitive-data exposure of any single tool. If you are triaging where to focus discovery and monitoring, that concentration is a gift: most of the exposure surface lives behind one product.

But concentration of usage is not the same as concentration of risky behavior. Cyberhaven's AI Adoption and Risk research breaks out how much of each tool's usage runs through personal, unmanaged accounts, and the spread is wide:

AI tool	Share accessed via personal accounts (Cyberhaven)
ChatGPT	~32.3%
Claude	~58%
Perplexity	~60%

The lesson: ChatGPT is where the volume is, but tools like Claude and Perplexity see a majority of their access through personal, unmanaged accounts. A discovery program that only watches the dominant vendor will systematically miss the tools most likely to be used off the books.

The trend line is going the wrong way

A single snapshot can be dismissed as a moment in time. A trend cannot. Cyberhaven's longitudinal research tracks the share of AI inputs that contain sensitive data, and that share has climbed relentlessly: from roughly a tenth of inputs in 2023 to 39.7% in its 2026 report. Nearly four in ten things employees feed into AI tools now contain something sensitive.

Cyberhaven frames the cadence memorably: the average employee inputs sensitive data into an AI tool roughly once every three days. Multiply that across a workforce of thousands and the aggregate is a continuous, low-grade hemorrhage of data, not a series of isolated incidents.

Worth flagging for the careful reader: secondary coverage sometimes cites much higher daily paste counts. Those figures usually reflect *all* pastes, including benign internal ones. The primary-source framing we are using here, about 14 pastes per day into non-corporate accounts with 3 or more sensitive, is the tighter, more defensible measure of risk.

What it costs when it goes wrong

Behavioral telemetry tells you the exposure is happening. IBM's 2025 Cost of a Data Breach Report, conducted by the Ponemon Institute across roughly 600 organizations from March 2024 to February 2025, tells you what it costs when that exposure turns into a breach. The shadow-AI numbers are now broken out as their own category, and they are not subtle.

1 in 5 (20%) breaches involved shadow AI.
Shadow-AI breaches cost roughly $670K more than other breaches, landing at about $4.63M, against a global average breach cost of $4.44M (down from $4.88M, the first decline in five years).
13% of organizations reported breaches of AI models or applications, and 97% of those lacked proper AI access controls.
63% of breached organizations had no AI governance policy or were still building one.

There is a second, sharper finding in the IBM data, reported in the official IBM newsroom release (July 30, 2025): shadow-AI incidents do not just cost more, they expose worse data. Shadow-AI breaches disproportionately exposed PII (65%) and intellectual property (40%). The data leaking through the unmanaged AI channel is, statistically, among the most sensitive data you hold.

Stack the three research bodies together and the chain is complete. LayerX shows the exposure is happening at scale through invisible channels. Cyberhaven shows the trend accelerating. IBM shows what it costs and confirms that the organizations getting hit are the ones without governance. Coverage from Kiteworks on the IBM report drew the same line: the controls gap, not the technology, is the predictor of cost.

The real root cause is behavioral, not technical

It is tempting to read these numbers as a story about careless employees who need more training. That misreads the mechanism. The problem is not ignorance; it is the phenomenology of the interface.

People experience a prompt as a conversation, not as a transmission. The interface is engineered to feel casual and conversational, and the persistent belief that the chat is safe enough, that the data goes nowhere and the conversation disappears, is the core of the problem.

The Register's October 2025 reporting on shadow AI and leaked secrets landed on the same human truth, and it is the spine of why this is so hard to stop. A chat box does not feel like uploading a file to a third party. It feels like talking. Nobody runs a mental data-classification check before saying a sentence out loud, and the prompt box borrows that same conversational reflex.

This is why awareness training alone has a low ceiling. You cannot train your way out of an interface that is specifically designed to lower the friction and the felt-significance of sharing. The four-second action will always beat the longer act of judgment, because the interface optimized the four seconds and erased the cue that judgment was even required.

A working taxonomy of shadow AI data exposure

To govern this, it helps to name the distinct ways data leaks. They are not interchangeable, and they fail different controls.

Exposure type	What happens	Why it evades controls
Inadvertent paste	Employee pastes sensitive text directly into a prompt	No file artifact, no email, just a clipboard event in a browser tab
File upload	Transcripts, spreadsheets, contracts uploaded to GenAI (~40% contain PII/PCI)	Upload goes to a personal account outside sanctioned SaaS API monitoring
Unmanaged personal account	GenAI accessed via personal login outside SSO (82% of pastes)	No corporate identity to attach a policy to
Agentic / MCP data movement	Autonomous agents and MCP servers read and transmit data programmatically	Happens at machine speed with no human paste event to observe
Lethal trifecta exfiltration	Private-data access + untrusted content + external comms enable injected theft	The exfiltration is triggered by the attacker's content, not the user

The first three are the human story the telemetry measures today. The last two are where this goes next, and where it stops being a DLP problem and becomes an agent-governance problem.

Why traditional DLP cannot catch up

Legacy data loss prevention was architected for a world of email gateways, endpoint file operations, and a known set of sanctioned SaaS apps reachable by API. None of those vantage points see a clipboard paste into a text box on chatgpt.com under a personal login. The event leaves no email, creates no managed-file artifact, and never touches a sanctioned API.

That is the architectural reason the 82% personal-account figure is so damaging. The majority of exposure is happening in exactly the place where the dominant control has no sensor. Across the industry, the large majority of organizations report lacking the technical controls to govern AI data flows, and reporting from eSecurity Planet on shadow AI and DLP reaches the same verdict: the tools most enterprises own were not built to see this. We unpack the architecture gap in detail in why traditional DLP fails for AI agents.

The conclusion is not that DLP is useless. It is that visibility has to come before control. A blocking rule, a policy, an alert, all of them require that something can first observe the event. The 82% blind spot is, before anything else, a sensing problem.

The agentic extension: paste exposure at machine speed

A human pasting into ChatGPT is a slow, manual, one-event-at-a-time problem. Autonomous agents and Model Context Protocol (MCP) servers remove every one of those limits. An agent reads private data and transmits it programmatically, continuously, across many systems, with no clipboard event for anyone to observe. The same exposure pattern, ported to machine speed.

This is where Simon Willison's lethal trifecta, introduced in June 2025, becomes the framing that matters. When an agent simultaneously has access to private data, exposure to untrusted content, and a path for external communication, an attacker who controls the untrusted content can instruct the agent to exfiltrate. The user never pastes anything; the attacker's content does the pasting. We cover the production version of this in the lethal trifecta and agent data exfiltration.

OWASP has codified the agentic side of this risk. Prompt Injection sits at the top of the OWASP Top 10 for LLM Applications (2025) as LLM01, and Excessive Agency captures the over-permissioned agent that can act far beyond what it should. The OWASP Top 10 for Agentic Applications, released in December 2025, extends that into the autonomous setting. For the broader pattern of how injected instructions become data theft, see indirect prompt injection explained.

From exposure to governance: visibility first

The through-line from the human paste event to the autonomous agent is identity and visibility. Shadow AI starts as an employee using ChatGPT under a personal login that IT never provisioned. It ends as an agent or MCP server running with credentials and permissions nobody inventoried. In both cases the failure is the same: the activity is invisible to the people responsible for governing it.

This is precisely the gap Anomity is built to close. We covered the founding rationale in AI agents are the new shadow IT, and what we actually found when we started scanning real environments in what we found scanning AI configs. The job is the same whether the actor is a human or a machine: discover it, inventory it, watch its behavior, and flag the anomalies.

Concretely, the visibility layer has to do four things that legacy DLP cannot:

Discover and inventory every AI tool, agent, and MCP server in actual use, including personal-account access, not just the sanctioned list.
Monitor behavior at the data layer, the machine analog of paste-event telemetry, so you see what agents and MCPs do with data, not just what they were configured to do.
Track permissions against use, mapping OWASP Excessive Agency to real over-provisioned agents so you can spot the gap between granted and needed.
Detect lethal-trifecta configurations, flagging the unsafe-by-design pattern where private data, untrusted input, and external comms coexist in one agent.

Enforcement, whether DLP, policy, or a hard block, sits on top of that layer. It cannot sit anywhere else, because there is nowhere else to stand. If you want the next step, how to build an AI agent inventory walks through the discovery foundation, and why we built Anomity explains the visibility-first thesis end to end.

The bottom line

The research is no longer ambiguous. LayerX's browser telemetry shows GenAI is the top exfiltration channel and that 82% of it flows through accounts you cannot see. Cyberhaven shows the sensitive-data share of AI inputs climbing to nearly 40% and still rising. IBM shows that one in five breaches now involves shadow AI, costs roughly $670K more, and disproportionately exposes your most sensitive data, with governance gaps as the common denominator.

The behavioral root cause means you cannot train this away, and the architectural reality means legacy DLP cannot see it. What is left is the oldest principle in security, restated for a new layer: you can't govern what you can't see. Build the visibility first, for the humans pasting into ChatGPT today and for the agents and MCP servers that will be moving that same data at machine speed tomorrow. Everything else is enforcement, and enforcement has to have something to stand on.

Frequently asked questions

What is shadow AI data exposure?

Shadow AI data exposure is the leakage of sensitive corporate data into generative AI tools, like ChatGPT, Claude, Gemini, and Copilot, that are used outside of IT and security oversight. It usually happens when employees paste text or upload files into AI tools through personal accounts that are not covered by enterprise SSO or DLP. Browser telemetry from LayerX shows GenAI is now the single largest corporate-to-personal data movement channel, accounting for 32% of that flow.

How much sensitive data do employees actually paste into ChatGPT?

LayerX's browser-based telemetry found that 77% of enterprise AI users paste data into GenAI prompts, averaging about 14 pastes per day into non-corporate accounts, with at least 3 of those containing sensitive data. Around 22% of all pastes carry PII or PCI data, and roughly 40% of files uploaded to GenAI contain PII or PCI. Separately, Cyberhaven's 2026 analysis found the share of AI inputs containing sensitive data reached 39.7%, up from roughly a tenth of inputs in 2023.

Which AI tool sees the most shadow AI usage?

ChatGPT dominates enterprise GenAI usage by a wide margin. LayerX found that roughly 92% of all enterprise AI usage is concentrated in ChatGPT, even though only a minority of employees use GenAI tools at all. The picture differs for how much of each tool's usage runs through personal accounts: Cyberhaven reported personal-account rates of about 32.3% for ChatGPT, but roughly 58% for Claude and about 60% for Perplexity.

How much does a shadow AI breach cost?

IBM's 2025 Cost of a Data Breach Report, conducted by Ponemon across roughly 600 organizations, found that breaches involving shadow AI cost about $670,000 more than other breaches (about $4.63M for shadow-AI breaches), against a global average breach cost of $4.44M, the first decline in five years. One in five breaches involved shadow AI, and shadow-AI incidents were more likely to expose PII (65%) and intellectual property (40%).

Why does traditional DLP fail to stop shadow AI data exposure?

Traditional DLP was built to inspect email, endpoint file movement, and sanctioned SaaS APIs. It generally cannot see a clipboard paste into a web text box on an unmanaged personal account inside a browser tab. Since LayerX found 82% of GenAI pastes come from personal accounts, the majority of the exposure happens precisely where legacy DLP has no visibility. See our breakdown in why traditional DLP fails for AI agents.

Is shadow AI worse with autonomous agents and MCP servers?

Yes, the exposure compounds. A human paste is a single, slow, manual event; an autonomous agent or MCP server can read and transmit private data programmatically, at machine speed, across many systems. Simon Willison's lethal trifecta, the combination of private-data access, exposure to untrusted content, and an external communication path, describes how an injected instruction can turn a helpful agent into an exfiltration tool without any human in the loop.

What should a CISO do first about shadow AI?

Start with discovery, not blocking. You cannot write policy, tune DLP, or run an incident response against tools you cannot see. Build an inventory of which AI tools, agents, and MCP servers are actually in use across the fleet, including personal-account access, then layer monitoring and enforcement on top. Anomity exists to provide that visibility layer for both human GenAI use and the autonomous agent and MCP layer behind it.

How common is it for breached organizations to lack AI governance?

Very common. IBM found that 63% of breached organizations either had no AI governance policy or were still building one, and that among organizations reporting breaches of AI models or applications, 97% lacked proper AI access controls. The data points to a widespread gap between AI adoption and the controls needed to govern it.