Blog News

When an AI API returns another user's response: shared caches and cross-tenant leaks

A Claude API incident appears to have returned other users' responses. Why shared caches cause cross-tenant leaks, and how to build around the risk.

Jun 05, 2026 • 8 min read

Simon Wijckmans Founder & CEO

Illustration of a Claude API request and a returned response that belongs to a different user, flagged 'response does not match request', between Anthropic and cside logos

For a few hours on the afternoon of 2026-06-05, the Claude API appears to have been returning responses that did not belong to the user who asked for them. You send a request, and you get back output that reads like an answer to someone else's prompt.

Anthropic's status page logged the event as "elevated errors on many Claude models," covering Opus 4.5 through 4.8 and Sonnet 4.6. It does not, as of writing, describe a data leak. The cross-user read comes from circulating screenshots and firsthand reports, so treat it as an early interpretation, not a confirmed breach.

The interesting part is not whether one provider had a bad afternoon. It is the failure class. When this kind of thing happens, it almost always traces back to shared mutable state under load, and that is a risk every fast-scaling AI provider is carrying right now.

What appears to have happened

According to Anthropic's status page, the incident ran through the afternoon of 2026-06-05: investigation opened around 15:19 UTC, the issue was identified by 15:43 UTC, and it was marked resolved by 18:28 UTC. The affected models were Claude Opus 4.5, 4.6, 4.7, and 4.8, along with Sonnet 4.6. The official label was "elevated errors," the generic bucket providers use for anything from timeouts to malformed responses.

The reports that drew attention describe something more specific. Users said that some API calls came back with content that had nothing to do with their own prompt, and in at least one widely shared case, one person's task appeared inside a completely unrelated user's response. Several people said they had received what looked like another customer's inference output, and that they checked carefully before concluding it was an upstream error rather than a bug on their end.

Two things are true at once. Anthropic has not confirmed cross-user data exposure, and the public evidence is consistent with it. A responsible reading holds both: this looks like cross-tenant response bleed, and it is not yet officially confirmed as one. I am not reproducing the leaked screenshots here, because they contain another customer's prompt and output, which is exactly the data that should not travel any further.

The failure class: shared mutable state under load

Cross-tenant bleed is when one customer's data surfaces in another customer's session. It is one of the oldest and most dangerous bugs in multi-tenant systems, and it rarely comes from a dramatic breach. It usually comes from a piece of shared, mutable state that was supposed to be keyed per request and, under the wrong conditions, was not.

A modern AI API is not a single program answering one request at a time. It is a stack of shared layers: load balancers, request routers, gateways, queues, in-memory caches, and connection pools, all serving every customer at once to keep latency and cost down. Each of those layers holds state. Each is a place where a request can pick up the wrong response if a cache key collides, a connection is reused after it should have been discarded, or a cancelled request leaves a stale object behind.

The symptom is almost always the same: you ask for your data and you get someone else's. The cause is almost always boring: a small mistake in how a shared object is reused, triggered by load or by an edge case like a cancelled or timed-out request.

We have seen this exact pattern before

The clearest precedent is OpenAI, in March 2023. On 2023-03-20, a change to its servers caused a spike in cancelled Redis requests, which tripped a bug in the redis-py client library. For a window that day, some ChatGPT users could see the chat titles of other active users, and in some cases the first message of a newly created conversation.

The same incident exposed limited billing data. OpenAI notified about 1.2% of ChatGPT Plus subscribers that another user may have seen their first and last name, billing address, credit card type, expiration date, and the last four digits of their card. Full card numbers were never exposed. As Help Net Security reported at the time, and as OpenAI confirmed in its own postmortem, the root cause was a shared caching and connection layer returning data to the wrong client after cancelled requests.

That is the same failure class as the symptom described in the Claude reports: a shared, mutable layer handing one tenant's data to another under load. Different company, different library, same shape.

Why scaling makes this more likely, not less

Here is the uncomfortable structural part. AI providers are adding capacity faster than almost any infrastructure in computing history. Demand outruns hardware, so teams keep bolting on layers to cope: more caching to cut token costs, more routing to spread load across regions and model versions, more proxies and gateways to manage quotas and failover.

Every one of those layers is a performance win and a new place for state to leak. A cache that serves the wrong key, a router that reuses a connection, a proxy that folds one response into another stream: each is a single bug away from cross-tenant bleed. The harder a provider scales, the more of these layers it runs, and the more load it pushes through them.

So this is less an Anthropic story than an everyone story. The providers racing hardest to add capacity are precisely the ones most exposed to this class of bug, because speed and shared infrastructure are how you serve millions of requests cheaply. It is not a sign that one company is careless. It is a property of the architecture the whole industry is building on.

What this means if you build on LLM APIs

The practical takeaway is a mindset shift. An LLM API response is not a trusted, private channel between you and the model. It is data coming back from a massively shared, fast-changing system, and on a bad day it can be wrong, stale, or cross-wired to another tenant.

Treat it that way:

Assume a response can occasionally belong to the wrong request. Do not design flows where one cross-wired response silently corrupts a user's record or exposes it to someone else.
Do not send data to an LLM API that you could not tolerate surfacing elsewhere, unless your contract and the provider's controls genuinely cover it. Zero-retention and enterprise isolation terms matter here.
Validate and constrain responses before you act on them. Check shape, type, and plausibility, the way you would validate any untrusted input.
Log enough to detect bleed. If you never store request and response metadata, you cannot tell the difference between a model error and a response that was never yours.

There is a browser-layer version of this that is easy to miss. A growing number of apps pipe model output straight into the page: streamed answers, AI-generated HTML, tool results rendered inline. If that output can be cross-wired or injected, rendering it blindly turns a backend issue into a client-side one. You can end up showing another user's content to your user, or executing markup you did not write, inside a session where your user is already authenticated.

What to do this week

Map every place an LLM API response enters your system, and mark which ones render directly in the browser.
Treat those responses as untrusted input: validate structure, escape anything rendered into the page, and never inject raw model HTML without sanitizing it.
Decide explicitly what data you are willing to send to each provider, and confirm the retention and isolation terms that apply.
Add logging and alerting that can tell a malformed response apart from one that does not match the request you sent.
Watch the browser layer, where AI output, third-party scripts, and user data now meet in the same session.

How cside fits

cside does not run inside Anthropic's or any provider's backend, and it cannot fix a cache that hands back the wrong tenant's data. What it addresses is the same class of problem one step closer to your user: the browser, where AI responses, third-party scripts, and authenticated sessions now share the same page.

cside gives runtime visibility into what actually executes in that browser session: which scripts load, how they change after deploy, what data they read, and where they send it. As more applications render model output directly into the page, that visibility is how you catch the client-side consequences of a wrong or cross-wired response, such as content rendered into the wrong session or a script reaching for data it should never touch.

The broader point is the one cside has always made about third-party scripts, now generalized to AI infrastructure. Every layer you add to move faster is a layer where data can surface in the wrong place. You cannot remove those layers, but you can instrument the one closest to your user.

Start with client-side security for runtime script monitoring, or Privacy Watch to see exactly what the code on your pages collects and sends.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Book a demo

Start for free

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics

When an AI API returns another user's response: shared caches and cross-tenant leaks

A Claude API incident appears to have returned other users' responses. Why shared caches cause cross-tenant leaks, and how to build around the risk.

An OAuth 2.0 access token at the center of a diagram connecting a legitimate device to an attacker's device

MFA Didn't Fail, the Trust Model Did: Device Code Phishing and OAuth Token Theft (Kali365)

Kali365 abuses the OAuth 2.0 device authorization grant to steal Microsoft 365 tokens after MFA. A technical breakdown of the flow, FOCI, and detection.

A browser window dissolving into a stream of blue light particles on a dark background

AI is compressing the exploit cycle: Google's AI-developed zero-day and what it means for browsers

Google flagged a zero-day it believes was AI-developed. The real AI security shift isn't smarter phishing, it's how fast exploits reach the browser.

Session tokens passing through a browser security boundary into device fingerprinting signals

The Browser Session Is Now a Security Control Plane. Attackers Knew That Years Ago.

Google's DBSC proposal validates a clear security shift: browser sessions need device-aware validation after login, not only MFA.

Comparing Solutions for Account Takeover Prevention | 2026

Anti-fraud suites, fingerprinting tools, and MFA compared by what they cover in the ATO attack chain. Find the right stack for your risk profile.

How to Stop AI Agents From Creating Fake Accounts (Guide)

AI agents create fake accounts using real browsers, residential IPs, and generated identities. Here's the detection signal stack to stop them.

How to Block AI-Agent Based Content Scraping Bots (Guide)

AI content scraping bots use real browsers, residential IPs, and LLM-powered extraction to harvest your pricing and content. Here's how to stop them.

Dark cside banner showing an npm worm supply chain attack

The Snowball Effect: How Mini Shai-Hulud Turns npm into a Worm Distribution Network

Mini Shai-Hulud turned npm packages into a credential-theft loop. Here is how the AntV wave spread and what teams should monitor next.

Why CAPTCHAs Are No Longer Reliable Bot Defense

CAPTCHAs are no longer a reliable primary bot defense. Learn why visible challenges fail and how resource-wasting defenses raise attacker cost.

Illustrated Funnull sanctions blog banner showing infrastructure laundering and browser supply-chain risk

Funnull Sanctioned: What the Polyfill[.]io Attack Exposed About Infrastructure Laundering

OFAC's Funnull sanctions show why the Polyfill attack was part of a larger infrastructure laundering and browser supply-chain risk.

When an AI API returns another user's response: shared caches and cross-tenant leaks

What appears to have happened

The failure class: shared mutable state under load

We have seen this exact pattern before

Why scaling makes this more likely, not less

What this means if you build on LLM APIs

What to do this week

How cside fits

Further reading on cside

Monitor and Secure Your Third-Party Scripts

When an AI API returns another user's response: shared caches and cross-tenant leaks

MFA Didn't Fail, the Trust Model Did: Device Code Phishing and OAuth Token Theft (Kali365)

AI is compressing the exploit cycle: Google's AI-developed zero-day and what it means for browsers

The Browser Session Is Now a Security Control Plane. Attackers Knew That Years Ago.

Comparing Solutions for Account Takeover Prevention | 2026

How to Stop AI Agents From Creating Fake Accounts (Guide)

How to Block AI-Agent Based Content Scraping Bots (Guide)

The Snowball Effect: How Mini Shai-Hulud Turns npm into a Worm Distribution Network

Why CAPTCHAs Are No Longer Reliable Bot Defense

Funnull Sanctioned: What the Polyfill[.]io Attack Exposed About Infrastructure Laundering