Blog Attacks

How to Block GPTBot (and Why You Might Not Want To)

GPTBot crawls your site to train OpenAI models. Here is how to block it with robots.txt and IP ranges, plus what that block still leaves uncovered.

Jun 24, 2026 • 6 min read

Mike Kutlu Client-Side Security Consultant

How to Block GPTBot (and Why You Might Not Want To)

GPTBot is OpenAI's training crawler. It visits public web pages, collects content, and uses that content to train future versions of ChatGPT and other OpenAI models. It is distinct from OpenAI Operator (which transacts) and OAI-SearchBot (which powers ChatGPT's live browsing). Understanding which OpenAI system is visiting your site determines the right response.

Blocking GPTBot with robots.txt is straightforward and widely documented. The more important question is whether blocking the crawler changes what OpenAI's agents can do on your site, and the answer, for transacting agents like Operator, is no. For the broader pattern across AI scrapers, see our guide to blocking AI agent content-scraping bots.

What Is GPTBot?

Quick answer: GPTBot is a declared web crawler operated by OpenAI. Its purpose is to collect publicly available web content for use in training AI models. It identifies itself with a known user-agent string and operates from published IP ranges. OpenAI states that GPTBot respects robots.txt directives.

GPTBot's user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.1; +https://openai.com/gptbot)

OpenAI publishes GPTBot's current IP ranges in its bot documentation. The crawler visits pages, reads text content, and does not execute JavaScript in the same way a real browser does. It is a traditional HTTP crawler, not an interactive agent.

How to Block GPTBot with robots.txt

Quick answer: Add GPTBot to your robots.txt with a Disallow: / directive to block it from your entire site. OpenAI states it respects these directives. For path-level control, use specific Disallow rules to restrict access to sensitive sections while allowing GPTBot on public content.

To block GPTBot from your entire site:

User-agent: GPTBot
Disallow: /

To block GPTBot from specific paths only:

User-agent: GPTBot
Disallow: /private/
Disallow: /checkout/
Disallow: /account/
Allow: /blog/
Allow: /products/

OpenAI honours these directives for the declared GPTBot crawler. There is no technical enforcement mechanism; robots.txt is a declaration that compliant crawlers choose to follow. But GPTBot has a strong compliance record compared to some other AI crawlers that have been publicly criticised for ignoring robots.txt directives. The same robots.txt approach works for other declared crawlers, including CCBot.

IP-Level Blocking for GPTBot

Quick answer: OpenAI publishes GPTBot's IP ranges, which you can deny at your firewall or CDN. This provides an enforcement layer beyond robots.txt. It does not require the crawler to self-identify, which makes it more reliable than user-agent matching alone.

If you need hard enforcement rather than a declaration, add GPTBot's published IP ranges to your blocklist at the infrastructure level. This is the more reliable approach for high-value content because:

It does not depend on the crawler honouring robots.txt
It catches misconfigured or older GPTBot versions that may not read your robots.txt correctly
It provides a server-level log you can audit

OpenAI's published IP ranges change periodically, so this blocklist requires maintenance. Check OpenAI's bot documentation for the current list.

Why Blocking GPTBot Is Not Enough

Quick answer: GPTBot is OpenAI's training crawler. Blocking it does not affect OpenAI Operator (the transacting agent), OAI-SearchBot (the live browsing assistant), or any future OpenAI agentic system. Each operates independently with different user-agents, IP ranges, and behavioural profiles.

This is the distinction most engineers miss. A site owner who blocks GPTBot typically believes they have addressed "OpenAI's access to their content." They have addressed one OpenAI system out of several. Operator, ChatGPT's live browsing, and future agentic products are separate systems that GPTBot blocking does not touch.

The deeper issue is that GPTBot is a cooperative, declared crawler. You can block it because OpenAI tells you what it looks like. The more disruptive agents (undeclared, browser-based, transacting) are the ones that don't identify themselves and don't respect robots.txt in any meaningful sense. Blocking GPTBot addresses the visible, cooperative threat while leaving the invisible, uncooperative ones unaddressed. The same structural gap applies to other agentic systems, including OpenAI Operator, and the equivalent split between ClaudeBot and Claude-powered agents.

What Browser-Layer Detection Adds

Quick answer: GPTBot itself doesn't require browser-layer detection, it's visible at the network layer. But the agents that follow GPTBot's work (ChatGPT Operator, agentic shopping agents) are not. Browser-layer detection closes the gap between the crawlers you can see and the agents you cannot.

cside is not primarily needed to detect GPTBot. You can block it with two lines of robots.txt. cside addresses the agents that operate inside real browser sessions: the ones that execute JavaScript, interact with your UI, and create sessions that look identical to legitimate human users at the network layer.

The signals cside observes (interaction timing, fingerprint consistency, navigation patterns, behavioural cadence) are irrelevant for a simple HTTP crawler like GPTBot. They are essential for detecting Operator, agentic shoppers, and the undeclared automated sessions that robots.txt cannot stop. In cside's controlled testing, traditional tools missed AI agents operating inside real browser sessions in 81 out of 100 scenarios.

cside AI agent detection dashboard

Consider what this looks like in practice. An OpenAI Operator session targeting a retail site does not announce itself in any header. It launches a Chromium-based browser, loads the page with full JavaScript execution, accepts cookies, navigates the category tree at a plausible reading pace, adds items to the cart, and proceeds to checkout. At the network layer, every signal looks like a logged-in customer: the IP belongs to a residential proxy pool, the TLS fingerprint matches a current browser version, and the session cookie is valid.

What changes is the sub-layer behaviour: pointer events arrive with machine-precise spacing, scroll depth increments in consistent pixel intervals, and the time-on-page distribution for each product page clusters at a value far tighter than any human browsing population produces. cside's browser-layer instrumentation captures those signals and surfaces the session as automated before checkout is reached. A WAF, CDN rule, or user-agent filter sees nothing out of the ordinary. The same approach applies to undeclared AI content scrapers and other crawlers that mimic real browsers.

Should You Block GPTBot?

Quick answer: That depends on your relationship with OpenAI's products. Blocking GPTBot prevents your content from being used in training future models. It does not prevent ChatGPT from referencing your site via live browsing, and it does not prevent Operator from transacting on your site. Consider what you're actually trying to achieve before deciding.

Reasons to block GPTBot:

You don't want your proprietary content in OpenAI's training datasets
You have competitive concerns about your content being surfaced through ChatGPT answers
Your terms of service explicitly restrict automated data collection for AI training

Reasons not to block it (or to think carefully first):

Your content already benefits from ChatGPT citations in search results and AI answers
You want your brand and products to be well-represented in ChatGPT's knowledge base
Future agentic shopping systems trained on your product data may generate referral traffic

The SEO and GEO implications of blocking AI crawlers are still being worked out by the industry. A site that blocks all AI training crawlers today may find its products absent from AI-driven recommendation systems tomorrow.

Client-Side Security Consultant Mike Kutlu

Client-side security consultant at cside. 10+ years of experience implementing technology solutions for enterprises (previously at Oracle, Cloudflare, and Splunk). Now helping teams use client-side intelligence to catch & reduce fraud.

Don't just take our word for it, ask AI

FAQ

Frequently Asked Questions

GPTBot is OpenAI's web crawler that collects publicly available web content to train AI models including future versions of ChatGPT. It identifies itself with a known user-agent string, operates from published IP ranges, and is designed to respect `robots.txt` directives. It is an HTTP crawler that does not execute JavaScript or interact with web application interfaces.

Add `User-agent: GPTBot` followed by `Disallow: /` to your `robots.txt` file to block GPTBot from your entire site. For path-level control, use specific `Disallow` rules to restrict access to sensitive sections. OpenAI has stated that GPTBot respects these directives.

No. GPTBot and OpenAI Operator are separate systems. Blocking GPTBot prevents the training crawler from visiting your site. It has no effect on Operator, ChatGPT's live browsing assistant, or other OpenAI agentic products. Those systems operate independently with different user-agents and behavioural profiles.

Yes. OpenAI publishes GPTBot's IP ranges in its bot documentation. You can add these ranges to your firewall or CDN deny list for enforcement that does not depend on the crawler reading your `robots.txt` correctly. These IP ranges change periodically and require maintenance.

Blocking GPTBot prevents your content from being used in future training runs. It does not remove content that was already indexed before you added the block. ChatGPT's knowledge cutoff and the timing of GPTBot's previous visits to your site determine what OpenAI's models already know about your content.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Book a demo

Start for free

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics

How to detect and prevent account sharing without hurting legitimate users

The biggest objection to account sharing detection is false positives: what if we flag a subscriber who is just using multiple devices?

How to Block GPTBot (and Why You Might Not Want To)

GPTBot crawls your site to train OpenAI models. Here is how to block it with robots.txt and IP ranges, plus what that block still leaves uncovered.

Dark cside blog cover with a blue pixel wave and checklist about session recording tools and PII exfiltration risk

Session Recording Tools on Gambling Sites: The PII Exfiltration Risk Operators Are Missing

Session recording tools on gambling sites can silently exfiltrate player PII when misconfigured or compromised. Here are the three ways it happens.

Account sharing detection: how to close the enforcement gap that concurrent session limits miss

Concurrent session limits flag the obvious case. They do not distinguish between a single user on two devices and two people sharing one account.

A smooth glowing blue cursor path beside an angular red bot path on a dark plane.

Catching bots by the way they move: behavioral cursor detection

How cside's cursor_v2 model scores mouse movement to catch the stealth bots that already beat fingerprint and IP checks.

How to Block Applebot-Extended on Your Website

Applebot-Extended is Apple's AI training crawler that feeds Apple Intelligence. Learn how it differs from Applebot and how to opt out via robots.txt.

Dark cside blog cover with a blue pixel wave and checklist about monitoring third-party scripts across casino domains

How to Monitor Third-Party Scripts Across 100 or More Casino Domains

A practical guide to monitoring third-party scripts across 100-plus casino domains: script sprawl, cross-domain alerts, and scaling cside.

Agentic AI Security Risks for Websites: Privacy, Compliance, and Detection

Agentic AI browsers bypass cookie consent, execute real JavaScript, and create GDPR compliance gaps that CDN-level bot detection cannot see.

Illustration of a two-stage neural bot detection stack separating human and bot browser sessions

Catching bots that don't want to be caught: inside a two-stage neural detection stack

How a two-stage neural stack catches stealth browsers, proxied scrapers, and LLM agents that pass every fingerprint check, and where it hits a wall.

How to Block DeepSeekBot on Your Website

DeepSeekBot crawls your site for a Chinese AI company. Learn how to block it with robots.txt, IP rules, and the real data sovereignty risks it raises.