Skip to main content
Blog
Blog Attacks

How to Block DeepSeekBot on Your Website

DeepSeekBot crawls your site for a Chinese AI company. Learn how to block it with robots.txt, IP rules, and the real data sovereignty risks it raises.

Jun 22, 2026 6 min read
How to Block DeepSeekBot on Your Website

DeepSeekBot is the web crawler operated by DeepSeek, the Chinese AI company that gained widespread attention in early 2025 with models that matched or exceeded GPT-4 performance at a fraction of the training cost. The crawler collects web content for training and improving DeepSeek's AI models.

For many site owners, blocking DeepSeekBot is a data sovereignty decision as much as a technical one. The same robots.txt approach that works for GPTBot and ClaudeBot applies here, with some additional considerations.


What Is DeepSeekBot?

Quick answer: DeepSeekBot is a web crawler operated by DeepSeek, a Chinese AI research company. It collects publicly available web content to train DeepSeek's language models. It identifies itself with a declared user-agent string and is an HTTP crawler that does not execute JavaScript or interact with web application interfaces.

DeepSeek's crawler uses user-agent identifiers in the DeepSeek family. Like other declared AI training crawlers, it makes HTTP GET requests, reads text content, and is designed to respect robots.txt directives.

DeepSeek operates under Chinese law and data regulations, which creates a different risk profile from crawlers operated by US-based companies. Content collected by DeepSeekBot may be subject to data access requirements that apply to Chinese tech companies under Chinese jurisdiction. This is relevant context for organisations with regulatory obligations, sensitive intellectual property, or data governance policies that consider data origin.


How to Block DeepSeekBot with robots.txt

Quick answer: Add DeepSeekBot to your robots.txt with a Disallow: / directive. If DeepSeek's crawler respects robots.txt (which it is designed to do) this blocks all collection from your site. Use path-level rules for more granular control.

To block DeepSeekBot from your entire site:

User-agent: DeepSeekBot
Disallow: /

If you want to allow indexing on some content while protecting sensitive areas:

User-agent: DeepSeekBot
Disallow: /account/
Disallow: /checkout/
Disallow: /api/
Allow: /blog/

Unlike GPTBot and ClaudeBot, which have well-documented compliance records, DeepSeekBot's robots.txt compliance history is less thoroughly documented in public reporting. If enforcement reliability matters, consider supplementing robots.txt with IP-level blocking. The same gap applies to other lesser-documented training crawlers, such as ByteDance's Bytespider and Common Crawl's CCBot.


Data Sovereignty Considerations

Quick answer: DeepSeek is incorporated in China and operates under Chinese law. Content collected by its crawler may be subject to data access requirements that apply to Chinese technology companies. For organisations in regulated industries or with explicit data governance policies, this distinction carries compliance weight beyond what it would for a US-based crawler.

This is not a claim that DeepSeek actively misuses data. It is a statement about jurisdiction and the legal framework under which collected data exists. Organisations that maintain policies restricting data transfer to certain jurisdictions, or that have IP concerns about AI training data origin, have legitimate technical and legal reasons to block DeepSeekBot specifically rather than as part of a blanket AI crawler policy.

Security teams in financial services, healthcare, government contractors, and technology companies with proprietary IP have been among the earliest to add DeepSeekBot to their crawler blocklists for exactly this reason.


IP-Level Blocking for DeepSeekBot

Quick answer: DeepSeek publishes its crawler's IP ranges in its documentation. Adding these ranges to your firewall or CDN provides enforcement that does not depend on robots.txt compliance. Given the lower compliance certainty compared to US-based crawlers, IP blocking is the more reliable approach for organisations with strict requirements.

To implement IP-level blocking:

  1. Locate DeepSeek's current published IP ranges from their official documentation
  2. Add these ranges to your firewall, CDN, or reverse proxy deny list
  3. Set a review cycle for updates, as IP ranges expand with crawl infrastructure growth

As with all crawler IP lists, this requires ongoing maintenance. A quarterly review cycle is sufficient for most organisations.


DeepSeekBot vs. DeepSeek-Powered Agents

Quick answer: Blocking DeepSeekBot addresses DeepSeek's training data pipeline. If DeepSeek builds or enables agentic AI products that browse the web on users' behalf, those sessions would not be DeepSeekBot and would not be affected by your robots.txt rules.

DeepSeek's public product focus has been on language model capabilities rather than agentic browsing tools, but this is an evolving space. The structural gap applies here as it does to OpenAI and Anthropic: the declared crawler and any future interactive agents are separate systems.

Organisations that want comprehensive protection against all DeepSeek-related automated access to their sites should monitor DeepSeek's product announcements for agentic products, particularly any browser-use or computer-use capabilities that would create undeclared browser sessions. Browser-layer detection covers those scenarios; robots.txt does not.


Browser-Layer Detection: Beyond the Declared Crawler

Quick answer: Blocking DeepSeekBot addresses DeepSeek's declared training crawler. It does not address DeepSeek-powered agents or applications that browse your site in real browser sessions on behalf of users. Those sessions require browser-layer behavioural detection, not robots.txt rules.

DeepSeek's public product roadmap has focused on language model capability rather than agentic browsing tools, but the category is evolving. Any DeepSeek-powered tool that uses real browser automation would present as a standard browser session with no connection to DeepSeekBot's declared user-agent. Your robots.txt block would be irrelevant to that traffic. The same blind spot affects content protection more broadly, which is why blocking AI content scrapers increasingly depends on behaviour rather than self-declaration.

To understand what that gap looks like in practice: imagine a DeepSeek-powered research agent tasked with compiling competitor intelligence on a SaaS vendor. It opens a headless Chromium session, navigates the site's pricing and documentation pages in sequence, and extracts structured data. The session presents a legitimate Chrome fingerprint sourced from a data centre in a non-Chinese jurisdiction, so neither the IP origin nor the user-agent triggers any filter. The agent completes a full audit of six pages in under 40 seconds, with zero dwell time on images and no scroll-back behaviour. Those interaction anomalies are only visible at the browser layer. In cside's controlled testing, traditional tools missed AI agents operating inside real browser sessions in 81 out of 100 scenarios, precisely because network-layer tools see a clean request and stop there.

cside AI agent detection dashboard

More broadly, the data sovereignty concern that makes DeepSeekBot worth blocking applies equally to any AI-powered session accessing your site from infrastructure in jurisdictions with different data governance frameworks. cside's browser-layer monitoring surfaces named and unnamed agents by behavioural signal rather than self-declaration, including sessions that present no identifying information at all.

Mike Kutlu
Client-Side Security Consultant

Client-side security consultant at cside. 10+ years of experience implementing technology solutions for enterprises (previously at Oracle, Cloudflare, and Splunk). Now helping teams use client-side intelligence to catch & reduce fraud.

FAQ

Frequently Asked Questions

DeepSeekBot is the web crawler operated by DeepSeek, a Chinese AI company that develops large language models. It collects publicly available web content to train DeepSeek's AI systems. It uses a declared user-agent string and is designed to respect robots.txt directives. DeepSeek operates under Chinese law and data regulations.

Add User-agent: DeepSeekBot followed by Disallow: / to your robots.txt file to block it from your entire site. For path-level control, use specific Disallow rules. Given DeepSeekBot's less-documented compliance record compared to GPTBot or ClaudeBot, supplementing robots.txt with IP-level blocking is worth considering.

DeepSeek is a Chinese company operating under Chinese jurisdiction and data law. Organisations with policies restricting data transfer to certain jurisdictions, or with regulatory requirements that govern where their data can be accessed, have specific compliance reasons to block DeepSeekBot independently of a general AI crawler policy.

Blocking DeepSeekBot prevents your content from being collected in future training crawls. Content already collected before your block was added remains in existing model weights. Blocking the crawler does not affect any DeepSeek-powered products or agents that browse the web through browser sessions rather than the declared crawler.

DeepSeekBot is designed to respect robots.txt directives, but its compliance track record is less thoroughly documented in independent reporting compared to GPTBot (OpenAI) or ClaudeBot (Anthropic). Organisations with strict requirements should consider IP-level blocking as an enforcement complement to robots.txt. Quarterly review of DeepSeek's published IP ranges keeps that enforcement layer current.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics
Related Articles
Book a demo