Blog Attacks

How to Block DeepSeekBot on Your Website

DeepSeekBot crawls your site for a Chinese AI company. Learn how to block it with robots.txt, IP rules, and the real data sovereignty risks it raises.

Jun 22, 2026 • 8 min read

Mike Kutlu Client-Side Security Consultant

How to Block DeepSeekBot on Your Website

TL;DR: block DeepSeekBot on data sovereignty grounds

The data sovereignty risk: The default posture is that any declared crawler that respects robots.txt is safe to allow. DeepSeek's compliance track record is less thoroughly documented than GPTBot's or ClaudeBot's in independent reporting, which changes the risk calculation for regulated data.
Two-layer enforcement: DeepSeekBot uses a declared user-agent in the DeepSeek family and DeepSeek publishes IP ranges; a Disallow: / in robots.txt combined with a quarterly IP-range review at the firewall gives you enforcement that does not rely on the crawler self-policing.
The decision: If you run in a regulated industry, hold sensitive IP, or have an explicit policy on data transfer to Chinese jurisdiction, block DeepSeekBot at both layers before its next crawl. If you have no such policy, robots.txt alone is a proportionate response.

Short on time? See cside's AI-agent detection. It covers everything below in one deployment.

DeepSeekBot is the web crawler operated by DeepSeek, the Chinese AI company that gained widespread attention in early 2025 with models that matched or exceeded GPT-4 performance at a fraction of the training cost. The crawler collects web content for training and improving DeepSeek's AI models.

For many site owners, blocking DeepSeekBot is a data sovereignty decision as much as a technical one. The same robots.txt approach that works for GPTBot and ClaudeBot applies here, with some additional considerations.

What is DeepSeekBot?

Quick answer: DeepSeekBot is a web crawler operated by DeepSeek, a Chinese AI research company. It collects publicly available web content to train DeepSeek's language models. It identifies itself with a declared user-agent string and is an HTTP crawler that does not execute JavaScript or interact with web application interfaces.

DeepSeek's crawler uses user-agent identifiers in the DeepSeek family. Like other declared AI training crawlers, it makes HTTP GET requests, reads text content, and is designed to respect robots.txt directives.

DeepSeek operates under Chinese law and data regulations, which creates a different risk profile from crawlers operated by US-based companies. Content collected by DeepSeekBot may be subject to data access requirements that apply to Chinese tech companies under Chinese jurisdiction. This is relevant context for organisations with regulatory obligations, sensitive intellectual property, or data governance policies that consider data origin.

How to block DeepSeekBot with robots.txt

Quick answer: Add DeepSeekBot to your robots.txt with a Disallow: / directive. If DeepSeek's crawler respects robots.txt (which it is designed to do) this blocks all collection from your site. Use path-level rules for more granular control.

To block DeepSeekBot from your entire site:

User-agent: DeepSeekBot
Disallow: /

If you want to allow indexing on some content while protecting sensitive areas:

User-agent: DeepSeekBot
Disallow: /account/
Disallow: /checkout/
Disallow: /api/
Allow: /blog/

Unlike GPTBot and ClaudeBot, which have well-documented compliance records, DeepSeekBot's robots.txt compliance history is less thoroughly documented in public reporting. If enforcement reliability matters, consider supplementing robots.txt with IP-level blocking. The same gap applies to other lesser-documented training crawlers, such as ByteDance's Bytespider and Common Crawl's CCBot.

Data sovereignty considerations

Quick answer: DeepSeek is incorporated in China and operates under Chinese law. Content collected by its crawler may be subject to data access requirements that apply to Chinese technology companies. For organisations in regulated industries or with explicit data governance policies, this distinction carries compliance weight beyond what it would for a US-based crawler.

This is not a claim that DeepSeek actively misuses data. It is a statement about jurisdiction and the legal framework under which collected data exists. Organisations that maintain policies restricting data transfer to certain jurisdictions, or that have IP concerns about AI training data origin, have legitimate technical and legal reasons to block DeepSeekBot specifically rather than as part of a blanket AI crawler policy.

Security teams in financial services, healthcare, government contractors, and technology companies with proprietary IP have been among the earliest to add DeepSeekBot to their crawler blocklists for exactly this reason.

IP-level blocking for DeepSeekBot

Quick answer: DeepSeek publishes its crawler's IP ranges in its documentation. Adding these ranges to your firewall or CDN provides enforcement that does not depend on robots.txt compliance. Given the lower compliance certainty compared to US-based crawlers, IP blocking is the more reliable approach for organisations with strict requirements.

To implement IP-level blocking:

Locate DeepSeek's current published IP ranges from their official documentation
Add these ranges to your firewall, CDN, or reverse proxy deny list
Set a review cycle for updates, as IP ranges expand with crawl infrastructure growth

As with all crawler IP lists, this requires ongoing maintenance. A quarterly review cycle is sufficient for most organisations.

Layered enforcement diagram for DeepSeekBot showing robots.txt as a crawler intent signal, IP range blocking as network enforcement, and browser-layer detection as the control for undeclared DeepSeek-powered agent sessions that do not identify as the crawler

Enforcement layer	Stops declared DeepSeekBot crawler	Stops DeepSeek-powered agent in a real browser session
`robots.txt` rule	Yes (if respected)	No
IP-range blocklist (firewall / CDN)	Yes	No
cside browser-layer behavioural detection	Yes	Yes

An agent that opens a headless Chromium session presents a legitimate Chrome user-agent and a data-centre IP, so neither robots.txt nor an IP blocklist applies, only browser-layer behavioural detection sees it.

DeepSeekBot vs. DeepSeek-powered agents

Quick answer: Blocking DeepSeekBot addresses DeepSeek's training data pipeline. If DeepSeek builds or enables agentic AI products that browse the web on users' behalf, those sessions would not be DeepSeekBot and would not be affected by your robots.txt rules.

DeepSeek's public product focus has been on language model capabilities rather than agentic browsing tools, but this is an evolving space. The structural gap applies here as it does to OpenAI and Anthropic: the declared crawler and any future interactive agents are separate systems.

Flow diagram of a DeepSeek-powered agent running a headless Chromium session with a legitimate Chrome user agent, browsing six pages in under 40 seconds with behavioural anomalies that network-layer tools miss but browser-layer detection flags

A DeepSeek-powered research agent's session leaves a browser-layer fingerprint that network tools miss. It runs as headless Chromium presenting a legitimate Chrome user-agent, from a data-centre IP in a non-Chinese jurisdiction, so no IP or user-agent filter triggers. It audits 6 pages (Home, Pricing, Pricing detail, Docs, Docs API, Changelog) in under 40 seconds, with zero dwell time on images, no scroll-back, and strictly sequential navigation. Network-layer tools see a clean request and stop there; in cside's controlled testing, traditional tools missed AI agents in real browser sessions in 81 of 100 scenarios.

Organisations that want comprehensive protection against all DeepSeek-related automated access to their sites should monitor DeepSeek's product announcements for agentic products, particularly any browser-use or computer-use capabilities that would create undeclared browser sessions. Browser-layer detection covers those scenarios; robots.txt does not.

Browser-layer detection: beyond the declared crawler

Quick answer: Blocking DeepSeekBot addresses DeepSeek's declared training crawler. It does not address DeepSeek-powered agents or applications that browse your site in real browser sessions on behalf of users. Those sessions require browser-layer behavioural detection, not robots.txt rules.

DeepSeek's public product roadmap has focused on language model capability rather than agentic browsing tools, but the category is evolving. Any DeepSeek-powered tool that uses real browser automation would present as a standard browser session with no connection to DeepSeekBot's declared user-agent. Your robots.txt block would be irrelevant to that traffic. The same blind spot affects content protection more broadly, which is why blocking AI content scrapers increasingly depends on behaviour rather than self-declaration.

To understand what that gap looks like in practice: imagine a DeepSeek-powered research agent tasked with compiling competitor intelligence on a SaaS vendor. It opens a headless Chromium session, navigates the site's pricing and documentation pages in sequence, and extracts structured data. The session presents a legitimate Chrome fingerprint sourced from a data centre in a non-Chinese jurisdiction, so neither the IP origin nor the user-agent triggers any filter. The agent completes a full audit of six pages in under 40 seconds, with zero dwell time on images and no scroll-back behaviour. Those interaction anomalies are only visible at the browser layer. In cside's controlled testing, traditional tools missed AI agents operating inside real browser sessions in 81 out of 100 scenarios, precisely because network-layer tools see a clean request and stop there.

cside AI agent detection dashboard

More broadly, the data sovereignty concern that makes DeepSeekBot worth blocking applies equally to any AI-powered session accessing your site from infrastructure in jurisdictions with different data governance frameworks. cside's browser-layer monitoring surfaces named and unnamed agents by behavioural signal rather than self-declaration, including sessions that present no identifying information at all.

Client-Side Security Consultant Mike Kutlu

Client-side security consultant at cside. 10+ years of experience implementing technology solutions for enterprises (previously at Oracle, Cloudflare, and Splunk). Now helping teams use client-side intelligence to catch & reduce fraud.

Don't just take our word for it, ask AI

FAQ

Frequently Asked Questions

DeepSeekBot is the web crawler operated by DeepSeek, a Chinese AI company that develops large language models. It collects publicly available web content to train DeepSeek's AI systems. It uses a declared user-agent string and is designed to respect robots.txt directives. DeepSeek operates under Chinese law and data regulations.

Add User-agent: DeepSeekBot followed by Disallow: / to your robots.txt file to block it from your entire site. For path-level control, use specific Disallow rules. Given DeepSeekBot's less-documented compliance record compared to GPTBot or ClaudeBot, supplementing robots.txt with IP-level blocking is worth considering.

DeepSeek is a Chinese company operating under Chinese jurisdiction and data law. Organisations with policies restricting data transfer to certain jurisdictions, or with regulatory requirements that govern where their data can be accessed, have specific compliance reasons to block DeepSeekBot independently of a general AI crawler policy.

Blocking DeepSeekBot prevents your content from being collected in future training crawls. Content already collected before your block was added remains in existing model weights. Blocking the crawler does not affect any DeepSeek-powered products or agents that browse the web through browser sessions rather than the declared crawler.

DeepSeekBot is designed to respect robots.txt directives, but its compliance track record is less thoroughly documented in independent reporting compared to GPTBot (OpenAI) or ClaudeBot (Anthropic). Organisations with strict requirements should consider IP-level blocking as an enforcement complement to robots.txt. Quarterly review of DeepSeek's published IP ranges keeps that enforcement layer current.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Book a demo

Start for free

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics

Bot protection in 2026: why browser-layer detection catches what WAFs miss

AI agents run inside real Chromium browsers and slip past WAFs. Browser-layer detection reads canvas entropy and session cadence to catch them.

Chargeback fraud prevention: how device evidence wins disputes in 2026

Chargeback fraud prevention hinges on device evidence captured at checkout, the proof Visa CE 3.0 accepts when you contest a card-not-present dispute.

Account takeover solutions: understanding the category before you build a shortlist

Account takeover solutions span four layers: WAF, MFA, browser device intelligence, and behavioral analytics. No single vendor covers them all.

Best account sharing detection software 2026: an honest comparison

Device fingerprinting counts how many distinct devices sit behind one login, catching the seat abuse that IP-based tools and MFA controls miss.

Fake account detection: why email verification is not enough in 2026

Email verification and CAPTCHA confirm an endpoint, not a person. Device fingerprinting is what catches fake account signups at registration.

Best VPN detection software 2026: TLS handshake fingerprint TLS fingerprinting vs IP blocklists

The best VPN detection tools use TLS handshake fingerprint TLS fingerprinting to catch the residential proxies and VPN configurations that IP blocklists miss entirely.

PCI DSS compliance checklist 2026: Requirements 6.4.3 and 11.6.1 explained

Requirements 6.4.3 and 11.6.1 became mandatory in March 2025. Here is what belongs on a modern PCI DSS compliance checklist, and how to automate it.

Card testing fraud prevention software: how to stop automated card validation at checkout

See how browser-layer detection stops automated card testing at checkout using session behavior, AI agent signals, and device fingerprinting.

What is formjacking? How it works and how to detect it

Formjacking injects malicious JavaScript into checkout pages to steal card data as it is typed, invisible to WAFs and CSPs. Here is how to detect it.

What is credential stuffing? Definition, examples, and detection

Credential stuffing tests stolen username and password pairs from breaches against other sites. Learn how it works and how device signals catch it.