Skip to main content
Blog
Blog Attacks

Catching bots by the way they move: behavioral cursor detection

How cside's cursor_v2 model scores mouse movement to catch the stealth bots that already beat fingerprint and IP checks.

Jun 23, 2026 10 min read
A smooth glowing blue cursor path beside an angular red bot path on a dark plane.

Most automated traffic is easy to turn away. A user-agent check and a fingerprint test stop the vast majority of it before anything behavioral matters. The traffic worth engineering against is the small fraction that spoofs all of that, and that traffic still has to do one thing it cannot easily fake: move a cursor like a person.

This is a look at cursor_v2, a compact neural detector that scores how a mouse actually moved. It is small enough, about ten thousand parameters, to run on every gesture, and in our testing it holds up against every method we know of for synthesizing a human-looking cursor. Because this is a live security model, this post deals in results: what it catches, how little it costs, and where its limits honestly are. It does not describe the decision internals.

The short version:

  • cursor_v2 turns a single mouse movement into a human/bot score in under a millisecond, from a ~10,600-parameter model that ships as ~0.12 MB.
  • Against seven independent cursor generators (geometric, physical, neural human-imitation, and paths optimized specifically to beat it) it flags 86 to 100% of bot movements while flagging roughly 0% of real humans.
  • On a generator it was never trained on, it scores AUC 0.999 versus 0.920 for the previous generation.
  • We pointed an LLM agent on a stealth browser (Camoufox), one that defeats fingerprint-level detection, at a test storefront. cursor_v2 flagged 100% of its movements, even with motion-humanization turned on.

The adversary worth engineering against

Bot detection is a stack of filters, and most of the volume drops out at the cheap ones: an obviously automated user agent, a headless-browser fingerprint, a datacenter IP. The bots that survive those layers are the ones built by people who read the same research we do. They run real browsers, patch away the fingerprint tells, and route through residential networks. By the time such a session reaches a page, the static signals look clean.

What is left is behavior. A real person browsing a site produces a continuous stream of input, and human cursor motion is the product of a noisy biological control loop that is genuinely hard to reproduce. You can copy a fingerprint exactly. Copying the way a hand moves a mouse, convincingly, across thousands of gestures, is a different problem, and a separate one. A bot can be perfect at the fingerprint layer and still give itself away the instant it moves.

A detector small enough to run on everyone

cursor_v2 takes one cursor path, the samples a movement leaves behind as it travels from one point to another, and returns a single probability that the movement was produced by automation. It is fully neural, exported to ONNX, and runs on a CPU. There is no GPU, no service call, no per-request model server.

The reason that matters is economic. A detector you can only afford to run on suspicious sessions is a detector attackers learn to stay below. One that costs a fraction of a millisecond and a tenth of a megabyte can run on every movement of every session, which is exactly where behavioral signal lives.

The shipped footprint:

  • About 10,600 parameters total.
  • 0.74 ms per movement on a single CPU thread.
  • ~0.12 MB shipped model weight.
  • ~64 MB peak memory at inference.

At that footprint, a single core scores on the order of a thousand movements per second, cheap enough to be always-on.

Every fake cursor has a neighborhood

Internally, the model represents each movement as a compact vector. We do not expose what goes into that representation, but we can show what comes out of it. Project a large sample of movements down to three dimensions and a structure appears on its own: real human motion occupies one region, and each distinct way of synthesizing a cursor lands in its own region, set apart from the humans. A geometric humanizer, a physics-based mouse model, and a neural network trained to imitate humans all look different from each other, and all look different from a person.

In an interactive 3-D projection of that motion embedding, humans form a single neighborhood while every bot generator forms its own separated cloud, including the neural model trained directly on human motion. The projection axes are arbitrary; the point is that the classes separate, not how the model decides.

The generator gauntlet

Separation in a projection is suggestive, not proof. The real question is what the shipped model does when handed motion it has never seen. So we built a gauntlet: seven independent ways to generate a cursor path, scored end-to-end through the exact model we deploy, against a control set of held-out real human movements.

The families fall into three groups. Held-out families use a generator the model was never trained on, the fair test of generalization. Hardened families are ones we deliberately trained the model to resist, including a corpus of paths search-optimized specifically to evade this detector; a high catch rate there confirms the hardening held. And one trivial sanity floor.

Generator familyRegimennAUCRecall @ FPR ≤ 1%Caught @ shipped
Unhumanized generatorsanity floor5000.99598.6%98.2%
Geometric humanizerheld-out5000.99999.6%99.6%
Quantized humanizerheld-out5000.99999.6%99.6%
Physical motion modelhardened4991.00099.4%97.0%
Neural human-imitationhardened5001.000100.0%100.0%
Evasion-optimized pathshardened5000.99997.6%85.6%
Adversarial-exploit corpushardened5001.000100.0%96.0%

Control: 160 held-out human movements, scored through the same pipeline, with a false-positive rate of 0.0% at the shipped threshold and a median score of 0.000. Every bot number above is measured against that same control.

Catch rate per bot generator

The honest data point: even the corpus built by searching for paths that beat this detector is still flagged 85.6% of the time, while real humans flag at zero. Adversarial pressure moves the number; it does not break it.

The same result viewed as a score distribution: humans and bots pile up at opposite ends of the scale, with very little in between. Human output collapses near zero, pooled bot movements across all generators collapse near one, and the decision threshold sits in the empty gap between them.

Score distribution, humans vs bots

The real test: a stealth browser driven by an agent

A path generator is a clean lab subject. A modern stealth browser driven by an AI agent is the actual adversary, so we ran one. We built an LLM-driven crawler on top of Camoufox, a stealth browser specifically engineered to defeat fingerprint-level detection, gave it a goal, and let it shop its way through a test storefront: find a product, add to cart, head to checkout. As it navigated, we captured its real mouse movements and scored each one with the shipped detector.

Camoufox agentMovementsFlagged as botSession flagged
Default mode66 (100%)yes, after 3 movements
Motion-humanization on66 (100%)yes, after 3 movements

Every movement was flagged, in the agent's default mode and with its built-in motion-humanization explicitly enabled. The same scorer flags fewer than 1% of real human movements, so this is not a detector that simply distrusts everything.

Camoufox is excellent at what it is built for, and it would very likely sail past a fingerprint-only check. It was caught anyway, because stealth at the fingerprint layer does not buy stealth at the motion layer. The two fail independently, and that independence is the whole point of running detection in depth. For more on how these tools work, see stealth browsers and anti-detect browsers explained and how cside detects AI agents on stealth browsers.

Generalizing past the training distribution

One number deserves singling out, because it is the one that is easy to fake and easy to misread. Almost any cursor detector scores around 0.99 AUC on the kind of motion it trained on; that figure is close to meaningless. The figure that matters is transfer, performance on a generator that was held out of training entirely.

On a humanizer family neither model trained on, cursor_v2 reaches AUC 0.999, while the previous-generation detector falls to 0.920 and loses most of its recall. The gain is in generalization, not in-distribution polish.

Cross-generator AUC comparison

What we can prove, and what we cannot

A security writeup is only worth reading if it is honest about its edges, so here are ours.

In-distribution accuracy is not robustness. Every number on motion the model has seen is near-perfect and we do not lean on any of it. The claims here rest on held-out generators and a live stealth-browser test.

The cross-generator result is a strong point estimate, not a universal guarantee. It is calibrated on a finite sample of real humans and one held-out humanizer family. Read the 0.999 as cleared for that family, not for every cursor that will ever exist. The honest open lever is breadth of real human motion to calibrate against, more than it is the model itself.

In production we debounce. A single odd gesture never flags a person; the deployed scorer waits for several bot-scoring movements in a row before it acts. The numbers in this post are the harder per-movement view, before that smoothing.

What this means for teams running detection

Strip the specifics and the lesson generalizes. The bots that matter are the ones that have already defeated your static checks, and the way to catch them is with a signal that fails independently of the one they beat. Behavior, how a session actually moves, clicks, and scrolls, is that signal, and it is the layer attackers budget for least. The same reasoning runs through catching bots that don't want to be caught and how OpenClaw agents bypass bot detection.

The practical unlock is cost. A behavioral model small and fast enough to run on every visitor, not just the suspicious ones, removes the blind spot that sampling creates. cursor_v2 is one such signal: orthogonal to fingerprinting, cheap enough to be always-on, and, so far, holding up against the best cursor fakers we can build.

How cside fits

cside gives you full visibility into every script, request, and third-party touching your site, plus the behavioral signal to tell real users from the automation hiding among them. The cursor model is one layer in a detection stack that already separates humans, good bots, and malicious agents at the browser layer, where the static tells have already been spoofed.

Explore cside AI agent detection

Avneh
AI Researcher

Making machines learn. Applied math major currently developing the next generation of bot detection models at cside.

FAQ

Frequently Asked Questions

cursor_v2 is a small neural model that scores how a mouse actually moved. It takes one cursor path, the samples a movement leaves behind as it travels from one point to another, and returns a single probability that the movement was produced by automation. It runs on the CPU in under a millisecond per movement, so it can score every gesture rather than only suspicious sessions.

You can copy a browser fingerprint exactly. Human cursor motion is the output of a noisy biological control loop, and reproducing that convincingly across thousands of gestures is a different and independent problem. A bot can look perfect at the fingerprint layer and still give itself away the instant it moves, which is why a motion signal catches sessions that static checks clear.

In our test they did not. We pointed an LLM agent driving Camoufox, a stealth browser engineered to defeat fingerprint-level detection, at a test storefront and scored its real mouse movements. cursor_v2 flagged 100% of them, in default mode and with the agent's built-in motion-humanization enabled. Stealth at the fingerprint layer does not buy stealth at the motion layer.

Against a held-out control of real human movements, the false-positive rate was 0.0% at the shipped threshold, with a median human score of 0.000. In production the scorer also debounces: a single odd gesture never flags a person, because the deployed model waits for several bot-scoring movements in a row before it acts.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics
Related Articles
Book a demo