Skip to main content
Blog
Blog

AI Agent and Bot Detection: Telling Humans, Good Bots, and Malicious Agents Apart

A classification taxonomy and intent-based enforcement model for separating humans, good bots, and malicious agents, then deciding what each one gets.

Jul 11, 2026 6 min read
AI Agent and Bot Detection: Telling Humans, Good Bots, and Malicious Agents Apart

You have three problems wearing the same costume. A human reading your checkout page, a search crawler indexing it, and a stealth browser enumerating stolen cards against it can all present a plausible Chrome user-agent and a clean residential IP. Treat them as one bucket and you either block revenue or wave through fraud.

The fix is a taxonomy and a decision: classify each session into a known class, read what it is trying to do, and map that to exactly one action: allow, monitor, challenge, serve agent content, or block. This post is the classification and decision framework. For the underlying signal mechanics, the guide to detecting AI agent traffic covers identity, network, browser, and behavioral signals; for picking a vendor, see how to choose an AI agent detection solution. When you need to know why older defenses miss this traffic, legacy bot detection in the age of AI agents explains the gap. Here, the job is deciding what to do once you can see the traffic.

A five-class taxonomy that maps to action

"Good bot vs bad bot" is too coarse, because a consumer's shopping agent is automated and welcome, while a search crawler is automated and welcome for a completely different reason. Split traffic into five operational classes, each tied to a default action:

ClassExamplesIntentDefault action
HumanReal visitors, logged-in customersBrowse, buy, manage accountAllow, monitor risk
Good botGooglebot, GPTBot, ClaudeBot, PerplexityBot, partner API botsIndex content, declared integrationAllow, rate-limit, verify identity
Neutral automationUptime monitors, link checkers, RSS/preview fetchersOperational, low value, low harmMonitor, rate-limit
Consumer AI agentShopping and research agents acting for a real userComplete a task on behalf of a personAllow or serve agent content
Malicious agentScrapers, card testers, account-abuse bots, stealth browsersExtract value or commit fraudChallenge or block

The class is not fixed for a session. A consumer agent browsing product pages is in the "allow" column right up until it starts submitting payment forms at machine speed, at which point its intent, and its class, have changed.

Identity tells you who; intent tells you what to do

Identity signals answer "who does this claim to be": user-agent, declared crawler name, fingerprint. They are necessary and almost free to spoof. A self-declaring GPTBot can be verified by cross-checking the request IP against the crawler's published ranges, which catches impersonators. But the dangerous classes never declare themselves.

Intent signals answer "what is this session doing." They live in behavior and in the runtime, and they are far more expensive to fake convincingly:

  • navigator.webdriver set, or suppressed too cleanly, on a session that otherwise looks like vanilla Chrome.
  • CDP / Runtime leaks: Chrome DevTools Protocol artifacts (cdc_ properties, stripped accessibility nodes) that betray Playwright or Puppeteer driving the page.
  • Fingerprint drift: WebGL, Canvas, and Audio context that do not tell a coherent story about one device, or that mutate across a session.
  • Residential-proxy behavior: a "consumer" IP whose timezone, language, and ASN history don't line up, rotating across requests.
  • Action cadence: a burst of card submissions in a few minutes is intent, not identity. No user-agent string will tell you that; the sequence of actions will.

You classify on identity plus intent together. A session that passes every identity check but fails on runtime and cadence is exactly the malicious-agent case that network-only tooling waves through.

Why this matters more in 2026

The malicious class got cheap. cside's 2026 web security research reports that playwright-stealth installs were roughly ten times higher through 2025, a clean proxy for how fast anti-detection automation moved from a niche into mainstream attack tooling. cside 2026 research report

At the same time the welcome classes grew. AI-search crawlers now drive real discovery, and consumer shopping agents complete real purchases. So the two ends of the taxonomy expanded at once: more automation you want to allow, and more automation built specifically to look like it. That is why a binary detector fails: it has no column for "automated and welcome." For the deep mechanics of how the malicious end hides, see stealth browsers and anti-detect browsers, explained. The same signals catch the credential-stuffing runs that hit the login once an agent shifts from browsing to attacking accounts.

Map each class to one enforcement action

Once a session is classified, enforcement should be deterministic. Five actions cover the taxonomy:

  1. Allow: humans and verified good bots in their expected paths. Log and move on.
  2. Monitor: neutral automation and any session whose class is still ambiguous. Collect signals, don't add friction yet.
  3. Challenge / throttle: sessions trending malicious. Slow them, step up verification, or rate-limit the specific action (login, checkout) rather than the whole site.
  4. Serve agent content: a known consumer agent on a path where you'd rather guide than block. Give it a purpose-built view or a "contact us" step instead of leaking raw pricing to a scraper-shaped session.
  5. Block: confirmed malicious intent such as card enumeration, credential stuffing, and account-abuse runs.

Two rules keep this honest. Scope actions to the action, not the visitor: challenge the checkout submission, don't 403 the homepage. And make the decision per page: a stealth browser reading a blog post is a monitor case; the same session on your card vault is a block case. For the playbook on the block end, see how to block AI agents on your website, and for the payment-fraud variant, how to block AI card-testing agents.

Where the classification has to happen

This taxonomy only works if you can read intent, and intent lives in the browser. AI crawlers that never execute JavaScript never fire your analytics, so they're invisible to GA4 and PostHog. Consumer and malicious agents run real browsers and look human to those same tools. Neither end is separable at the analytics layer, and most of the malicious class passes network-layer checks by design: clean IP, valid user-agent, plausible request shape.

cside watches the browser runtime in real time. It captures the device and real IP, surfaces the automation and fingerprint signals that reveal intent, flags AI agents and stealth browsers inside the page, and exposes those signals via API so you can drive the allow / monitor / challenge / serve / block decision in your own workflow. That is the layer where a human, a good bot, and a malicious agent finally stop looking alike.

Further reading on cside

Simon Wijckmans
Founder & CEO

Founder and CEO of cside. Previously a product manager on Cloudflare Page Shield (now Cloudflare Client-Side Security). Co-chair of the W3C Anti-Fraud Community Group and a Forbes 30 Under 30 honoree. Building accessible security against client-side attacks — web security is not an enterprise-only problem.

FAQ

Frequently Asked Questions

Five operational classes cover most traffic: humans, good bots you want (search and AI crawlers, partner integrations), neutral automation you tolerate (uptime monitors, link checkers), consumer AI agents acting for a real user (shopping and research agents), and malicious agents (scrapers, card testers, account-abuse bots, stealth browsers). The classes matter because each one earns a different enforcement action. Collapsing them into 'bot vs not' throws away the decision you actually need to make.

Identity is who a session claims to be: a user-agent string, a declared crawler name, a fingerprint. Intent is what the session is trying to do right now: read an article, lock inventory, enumerate cards, create accounts. Identity is cheap to spoof and stable across a session; intent is revealed by behavior and shifts as the session moves from browsing to a transaction attempt. Enforcement should key off intent because that is the thing an attacker cannot fake for free.

Because good bots and consumer AI agents are now part of your traffic. Blanket blocking removes search and AI-search crawlers that drive discovery, breaks partner integrations, and turns away shopping agents that complete real purchases for real customers. It also destroys your own visibility. Once you 403 everything, you stop learning what was actually hitting your site. The goal is a policy that allows the useful classes and reserves friction for the harmful ones.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics
Related Articles
Book a demo