For a few hours on the afternoon of 2026-06-05, the Claude API appears to have been returning responses that did not belong to the user who asked for them. You send a request, and you get back output that reads like an answer to someone else's prompt.
Anthropic's status page logged the event as "elevated errors on many Claude models," covering Opus 4.5 through 4.8 and Sonnet 4.6. It does not, as of writing, describe a data leak. The cross-user read comes from circulating screenshots and firsthand reports, so treat it as an early interpretation, not a confirmed breach.
The interesting part is not whether one provider had a bad afternoon. It is the failure class. When this kind of thing happens, it almost always traces back to shared mutable state under load, and that is a risk every fast-scaling AI provider is carrying right now.
What appears to have happened
According to Anthropic's status page, the incident ran through the afternoon of 2026-06-05: investigation opened around 15:19 UTC, the issue was identified by 15:43 UTC, and it was marked resolved by 18:28 UTC. The affected models were Claude Opus 4.5, 4.6, 4.7, and 4.8, along with Sonnet 4.6. The official label was "elevated errors," the generic bucket providers use for anything from timeouts to malformed responses.
The reports that drew attention describe something more specific. Users said that some API calls came back with content that had nothing to do with their own prompt, and in at least one widely shared case, one person's task appeared inside a completely unrelated user's response. Several people said they had received what looked like another customer's inference output, and that they checked carefully before concluding it was an upstream error rather than a bug on their end.
Two things are true at once. Anthropic has not confirmed cross-user data exposure, and the public evidence is consistent with it. A responsible reading holds both: this looks like cross-tenant response bleed, and it is not yet officially confirmed as one. I am not reproducing the leaked screenshots here, because they contain another customer's prompt and output, which is exactly the data that should not travel any further.
The failure class: shared mutable state under load
Cross-tenant bleed is when one customer's data surfaces in another customer's session. It is one of the oldest and most dangerous bugs in multi-tenant systems, and it rarely comes from a dramatic breach. It usually comes from a piece of shared, mutable state that was supposed to be keyed per request and, under the wrong conditions, was not.
A modern AI API is not a single program answering one request at a time. It is a stack of shared layers: load balancers, request routers, gateways, queues, in-memory caches, and connection pools, all serving every customer at once to keep latency and cost down. Each of those layers holds state. Each is a place where a request can pick up the wrong response if a cache key collides, a connection is reused after it should have been discarded, or a cancelled request leaves a stale object behind.
The symptom is almost always the same: you ask for your data and you get someone else's. The cause is almost always boring: a small mistake in how a shared object is reused, triggered by load or by an edge case like a cancelled or timed-out request.
We have seen this exact pattern before
The clearest precedent is OpenAI, in March 2023. On 2023-03-20, a change to its servers caused a spike in cancelled Redis requests, which tripped a bug in the redis-py client library. For a window that day, some ChatGPT users could see the chat titles of other active users, and in some cases the first message of a newly created conversation.
The same incident exposed limited billing data. OpenAI notified about 1.2% of ChatGPT Plus subscribers that another user may have seen their first and last name, billing address, credit card type, expiration date, and the last four digits of their card. Full card numbers were never exposed. As Help Net Security reported at the time, and as OpenAI confirmed in its own postmortem, the root cause was a shared caching and connection layer returning data to the wrong client after cancelled requests.
That is the same failure class as the symptom described in the Claude reports: a shared, mutable layer handing one tenant's data to another under load. Different company, different library, same shape.
Why scaling makes this more likely, not less
Here is the uncomfortable structural part. AI providers are adding capacity faster than almost any infrastructure in computing history. Demand outruns hardware, so teams keep bolting on layers to cope: more caching to cut token costs, more routing to spread load across regions and model versions, more proxies and gateways to manage quotas and failover.
Every one of those layers is a performance win and a new place for state to leak. A cache that serves the wrong key, a router that reuses a connection, a proxy that folds one response into another stream: each is a single bug away from cross-tenant bleed. The harder a provider scales, the more of these layers it runs, and the more load it pushes through them.
So this is less an Anthropic story than an everyone story. The providers racing hardest to add capacity are precisely the ones most exposed to this class of bug, because speed and shared infrastructure are how you serve millions of requests cheaply. It is not a sign that one company is careless. It is a property of the architecture the whole industry is building on.
What this means if you build on LLM APIs
The practical takeaway is a mindset shift. An LLM API response is not a trusted, private channel between you and the model. It is data coming back from a massively shared, fast-changing system, and on a bad day it can be wrong, stale, or cross-wired to another tenant.
Treat it that way:
- Assume a response can occasionally belong to the wrong request. Do not design flows where one cross-wired response silently corrupts a user's record or exposes it to someone else.
- Do not send data to an LLM API that you could not tolerate surfacing elsewhere, unless your contract and the provider's controls genuinely cover it. Zero-retention and enterprise isolation terms matter here.
- Validate and constrain responses before you act on them. Check shape, type, and plausibility, the way you would validate any untrusted input.
- Log enough to detect bleed. If you never store request and response metadata, you cannot tell the difference between a model error and a response that was never yours.
There is a browser-layer version of this that is easy to miss. A growing number of apps pipe model output straight into the page: streamed answers, AI-generated HTML, tool results rendered inline. If that output can be cross-wired or injected, rendering it blindly turns a backend issue into a client-side one. You can end up showing another user's content to your user, or executing markup you did not write, inside a session where your user is already authenticated.
What to do this week
- Map every place an LLM API response enters your system, and mark which ones render directly in the browser.
- Treat those responses as untrusted input: validate structure, escape anything rendered into the page, and never inject raw model HTML without sanitizing it.
- Decide explicitly what data you are willing to send to each provider, and confirm the retention and isolation terms that apply.
- Add logging and alerting that can tell a malformed response apart from one that does not match the request you sent.
- Watch the browser layer, where AI output, third-party scripts, and user data now meet in the same session.
How cside fits
cside does not run inside Anthropic's or any provider's backend, and it cannot fix a cache that hands back the wrong tenant's data. What it addresses is the same class of problem one step closer to your user: the browser, where AI responses, third-party scripts, and authenticated sessions now share the same page.
cside gives runtime visibility into what actually executes in that browser session: which scripts load, how they change after deploy, what data they read, and where they send it. As more applications render model output directly into the page, that visibility is how you catch the client-side consequences of a wrong or cross-wired response, such as content rendered into the wrong session or a script reaching for data it should never touch.
The broader point is the one cside has always made about third-party scripts, now generalized to AI infrastructure. Every layer you add to move faster is a layer where data can surface in the wrong place. You cannot remove those layers, but you can instrument the one closest to your user.
Start with client-side security for runtime script monitoring, or Privacy Watch to see exactly what the code on your pages collects and sends.
Further reading on cside
- How compromised third-party scripts can prompt-inject AI agents
- On-device inference is coming for your security stack
- AI is compressing the exploit cycle
- Best practices for securing third-party scripts
As of 2026-06-05. The Claude API incident details reflect Anthropic's status page and public user reports available at that date. Anthropic has characterized the event as elevated errors and has not, as of writing, confirmed cross-user data exposure.








