Blog

AI Agent and Bot Detection: Telling Humans, Good Bots, and Malicious Agents Apart

A classification taxonomy and intent-based enforcement model for separating humans, good bots, and malicious agents, then deciding what each one gets.

Jul 11, 2026 • 6 min read

Simon Wijckmans Founder & CEO

AI Agent and Bot Detection: Telling Humans, Good Bots, and Malicious Agents Apart

You have three problems wearing the same costume. A human reading your checkout page, a search crawler indexing it, and a stealth browser enumerating stolen cards against it can all present a plausible Chrome user-agent and a clean residential IP. Treat them as one bucket and you either block revenue or wave through fraud.

The fix is a taxonomy and a decision: classify each session into a known class, read what it is trying to do, and map that to exactly one action: allow, monitor, challenge, serve agent content, or block. This post is the classification and decision framework. For the underlying signal mechanics, the guide to detecting AI agent traffic covers identity, network, browser, and behavioral signals; for picking a vendor, see how to choose an AI agent detection solution. When you need to know why older defenses miss this traffic, legacy bot detection in the age of AI agents explains the gap. Here, the job is deciding what to do once you can see the traffic.

A five-class taxonomy that maps to action

"Good bot vs bad bot" is too coarse, because a consumer's shopping agent is automated and welcome, while a search crawler is automated and welcome for a completely different reason. Split traffic into five operational classes, each tied to a default action:

Class	Examples	Intent	Default action
Human	Real visitors, logged-in customers	Browse, buy, manage account	Allow, monitor risk
Good bot	Googlebot, GPTBot, ClaudeBot, PerplexityBot, partner API bots	Index content, declared integration	Allow, rate-limit, verify identity
Neutral automation	Uptime monitors, link checkers, RSS/preview fetchers	Operational, low value, low harm	Monitor, rate-limit
Consumer AI agent	Shopping and research agents acting for a real user	Complete a task on behalf of a person	Allow or serve agent content
Malicious agent	Scrapers, card testers, account-abuse bots, stealth browsers	Extract value or commit fraud	Challenge or block

The class is not fixed for a session. A consumer agent browsing product pages is in the "allow" column right up until it starts submitting payment forms at machine speed, at which point its intent, and its class, have changed.

Identity tells you who; intent tells you what to do

Identity signals answer "who does this claim to be": user-agent, declared crawler name, fingerprint. They are necessary and almost free to spoof. A self-declaring GPTBot can be verified by cross-checking the request IP against the crawler's published ranges, which catches impersonators. But the dangerous classes never declare themselves.

Intent signals answer "what is this session doing." They live in behavior and in the runtime, and they are far more expensive to fake convincingly:

navigator.webdriver set, or suppressed too cleanly, on a session that otherwise looks like vanilla Chrome.
CDP / Runtime leaks: Chrome DevTools Protocol artifacts (cdc_ properties, stripped accessibility nodes) that betray Playwright or Puppeteer driving the page.
Fingerprint drift: WebGL, Canvas, and Audio context that do not tell a coherent story about one device, or that mutate across a session.
Residential-proxy behavior: a "consumer" IP whose timezone, language, and ASN history don't line up, rotating across requests.
Action cadence: a burst of card submissions in a few minutes is intent, not identity. No user-agent string will tell you that; the sequence of actions will.

You classify on identity plus intent together. A session that passes every identity check but fails on runtime and cadence is exactly the malicious-agent case that network-only tooling waves through.

Why this matters more in 2026

The malicious class got cheap. cside's 2026 web security research reports that playwright-stealth installs were roughly ten times higher through 2025, a clean proxy for how fast anti-detection automation moved from a niche into mainstream attack tooling. cside 2026 research report

At the same time the welcome classes grew. AI-search crawlers now drive real discovery, and consumer shopping agents complete real purchases. So the two ends of the taxonomy expanded at once: more automation you want to allow, and more automation built specifically to look like it. That is why a binary detector fails: it has no column for "automated and welcome." For the deep mechanics of how the malicious end hides, see stealth browsers and anti-detect browsers, explained. The same signals catch the credential-stuffing runs that hit the login once an agent shifts from browsing to attacking accounts.

Map each class to one enforcement action

Once a session is classified, enforcement should be deterministic. Five actions cover the taxonomy:

Allow: humans and verified good bots in their expected paths. Log and move on.
Monitor: neutral automation and any session whose class is still ambiguous. Collect signals, don't add friction yet.
Challenge / throttle: sessions trending malicious. Slow them, step up verification, or rate-limit the specific action (login, checkout) rather than the whole site.
Serve agent content: a known consumer agent on a path where you'd rather guide than block. Give it a purpose-built view or a "contact us" step instead of leaking raw pricing to a scraper-shaped session.
Block: confirmed malicious intent such as card enumeration, credential stuffing, and account-abuse runs.

Two rules keep this honest. Scope actions to the action, not the visitor: challenge the checkout submission, don't 403 the homepage. And make the decision per page: a stealth browser reading a blog post is a monitor case; the same session on your card vault is a block case. For the playbook on the block end, see how to block AI agents on your website, and for the payment-fraud variant, how to block AI card-testing agents.

Where the classification has to happen

This taxonomy only works if you can read intent, and intent lives in the browser. AI crawlers that never execute JavaScript never fire your analytics, so they're invisible to GA4 and PostHog. Consumer and malicious agents run real browsers and look human to those same tools. Neither end is separable at the analytics layer, and most of the malicious class passes network-layer checks by design: clean IP, valid user-agent, plausible request shape.

cside watches the browser runtime in real time. It captures the device and real IP, surfaces the automation and fingerprint signals that reveal intent, flags AI agents and stealth browsers inside the page, and exposes those signals via API so you can drive the allow / monitor / challenge / serve / block decision in your own workflow. That is the layer where a human, a good bot, and a malicious agent finally stop looking alike.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Book a demo

Start for free

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics

How to detect and prevent account sharing without hurting legitimate users

The biggest objection to account sharing detection is false positives: what if we flag a subscriber who is just using multiple devices?

How to Block GPTBot (and Why You Might Not Want To)

GPTBot crawls your site to train OpenAI models. Here is how to block it with robots.txt and IP ranges, plus what that block still leaves uncovered.

Dark cside blog cover with a blue pixel wave and checklist about session recording tools and PII exfiltration risk

Session Recording Tools on Gambling Sites: The PII Exfiltration Risk Operators Are Missing

Session recording tools on gambling sites can silently exfiltrate player PII when misconfigured or compromised. Here are the three ways it happens.

Account sharing detection: how to close the enforcement gap that concurrent session limits miss

Concurrent session limits flag the obvious case. They do not distinguish between a single user on two devices and two people sharing one account.

A smooth glowing blue cursor path beside an angular red bot path on a dark plane.

Catching bots by the way they move: behavioral cursor detection

How cside's cursor_v2 model scores mouse movement to catch the stealth bots that already beat fingerprint and IP checks.

How to Block Applebot-Extended on Your Website

Applebot-Extended is Apple's AI training crawler that feeds Apple Intelligence. Learn how it differs from Applebot and how to opt out via robots.txt.

Dark cside blog cover with a blue pixel wave and checklist about monitoring third-party scripts across casino domains

How to Monitor Third-Party Scripts Across 100 or More Casino Domains

A practical guide to monitoring third-party scripts across 100-plus casino domains: script sprawl, cross-domain alerts, and scaling cside.

Agentic AI Security Risks for Websites: Privacy, Compliance, and Detection

Agentic AI browsers bypass cookie consent, execute real JavaScript, and create GDPR compliance gaps that CDN-level bot detection cannot see.

Illustration of a two-stage neural bot detection stack separating human and bot browser sessions

Catching bots that don't want to be caught: inside a two-stage neural detection stack

How a two-stage neural stack catches stealth browsers, proxied scrapers, and LLM agents that pass every fingerprint check, and where it hits a wall.

How to Block DeepSeekBot on Your Website

DeepSeekBot crawls your site for a Chinese AI company. Learn how to block it with robots.txt, IP rules, and the real data sovereignty risks it raises.