Skip to main content
Blog
Blog Attacks

How to Block PerplexityBot on Your Website

PerplexityBot crawls your content for AI search results. Learn how to block it, why it faced copyright criticism, and how Perplexity Shopper differs.

Jun 25, 2026 5 min read
How to Block PerplexityBot on Your Website

PerplexityBot is the declared web crawler that powers Perplexity's AI search engine. When a user queries Perplexity, the search results draw from content PerplexityBot has indexed. In 2024, multiple publishers reported that Perplexity was reproducing copyrighted content from their sites in search results despite robots.txt blocks, making PerplexityBot one of the more controversial AI crawlers to block.

This guide covers PerplexityBot specifically. If you are trying to control Perplexity's shopping agent, see our companion post on how to block Perplexity Shopper, because it requires a different approach entirely. For the broader pattern across declared crawlers, see our guide to blocking AI agent content-scraping bots.


What Is PerplexityBot?

Quick answer: PerplexityBot is Perplexity's AI search crawler. It indexes web content to power Perplexity's AI-generated search results. It identifies itself with a declared user-agent string and is documented at docs.perplexity.ai. In 2024, it faced significant criticism from publishers for apparent robots.txt non-compliance and content reproduction without sufficient attribution.

PerplexityBot's user-agent: PerplexityBot/1.0 (+https://docs.perplexity.ai/docs/perplexitybot)

The 2024 controversy is relevant context for your blocking decision. Multiple major publishers, including media outlets and news organisations, reported that Perplexity was surfacing detailed reproductions of their paywalled or robots.txt-restricted content in AI search answers. Perplexity disputed some of these characterisations, but the episode established that PerplexityBot's compliance is more actively contested than GPTBot's or ClaudeBot's.


The 2024 Compliance Controversy

Quick answer: In 2024, Wired, The Atlantic, and other publishers reported that Perplexity was reproducing content from their sites in AI search results despite having Disallow: PerplexityBot in their robots.txt. Perplexity's explanations at the time were inconsistent, leading several publishers to take additional technical and legal steps.

The specific concern was not just crawling, it was summarisation and reproduction. Even if PerplexityBot honoured robots.txt for its direct crawl, Perplexity could access and summarise the same content through other means: cached copies, third-party data sources, or live browsing infrastructure. The net result from publishers' perspective was that their content appeared in Perplexity answers regardless of their robots.txt settings.

This does not mean robots.txt blocking is pointless for PerplexityBot. It means the scope of what robots.txt can achieve against a search product with multiple content acquisition channels is limited. IP-level blocking and active monitoring provide more reliable enforcement.


How to Block PerplexityBot with robots.txt

Quick answer: Add PerplexityBot to your robots.txt. Given the 2024 compliance controversy, also implement IP-level blocking and consider adding legal language to your terms of service explicitly restricting AI training data collection and AI search summarisation.

To block PerplexityBot from your entire site:

User-agent: PerplexityBot
Disallow: /

For path-level control:

User-agent: PerplexityBot
Disallow: /premium/
Disallow: /members/
Disallow: /api/
Allow: /public/

Given the 2024 controversy, treat robots.txt as a signal of intent rather than a hard technical control for PerplexityBot. The same declared-crawler approach is more dependable for crawlers with cleaner compliance histories, such as CCBot.


IP-Level Blocking

Quick answer: Perplexity publishes PerplexityBot's IP ranges in its documentation. Denying these ranges at the firewall or CDN level provides enforcement independent of whether the crawler reads robots.txt. For publishers or content-heavy sites, IP blocking is the more reliable approach given the compliance history.

Locate Perplexity's current IP ranges from their official documentation at docs.perplexity.ai. Add them to your firewall, CDN edge configuration, or reverse proxy deny rules. Review this list quarterly, as crawl infrastructure IP ranges expand as crawl volume grows.


PerplexityBot vs. Perplexity Shopper: A Critical Distinction

Quick answer: PerplexityBot (the indexing crawler) and Perplexity Shopper (the transacting agent) are separate systems. Blocking PerplexityBot does not affect Perplexity Shopper. Shopper uses a real browser session with a standard Chrome user-agent. It requires browser-layer detection, not robots.txt blocking.

SystemPurposeUser-agentDetection approach
PerplexityBotCrawls and indexes contentPerplexityBot/1.0 (declared)robots.txt + IP blocking
Perplexity ShopperCompletes purchases for usersStandard Chrome (undeclared)Browser-layer behavioural signals

Engineers who add PerplexityBot to robots.txt and consider the Perplexity problem solved have addressed one of the two systems. Perplexity Shopper is invisible to everything in the blocklist approach. In cside's controlled testing, traditional tools missed AI agents in 81 out of 100 controlled test scenarios, and Shopper is exactly the kind of session those tools miss.

What that looks like in practice: a Perplexity Shopper session tasked with buying a specific product opens a real Chrome session, navigates to a retailer's category page, filters by the requested specification, selects a product, and proceeds to checkout. Every network-layer signal is clean: a residential IP, a standard TLS handshake, and a Chrome user-agent string indistinguishable from a human shopper. The behavioural tell is in the browser layer. The agent moves through product filtering with no cursor variance, selects the first qualifying result without pausing to compare alternatives, and enters address data at a uniform keystroke interval with no correction events. cside's AI agent detection instrumentation captures those interaction-layer anomalies before any checkout event fires, giving operators visibility the network layer never provides.

cside AI agent detection dashboard


What PerplexityBot Blocking Actually Achieves

Quick answer: A PerplexityBot block prevents the declared crawler from directly indexing your content in future crawl runs. It does not prevent Perplexity from referencing previously indexed content, accessing your content through third-party sources, or surfacing summarisations in AI search results through channels other than direct crawling.

This is the limitation the 2024 controversy exposed. Robots.txt blocks a specific crawler from making new requests. It does not scrub existing indexed content from a search product's knowledge base, and it does not prevent content acquisition through alternative channels that the crawler itself does not directly use.

For organisations with strict requirements (paywalled content, proprietary research, licensed material), the combination of robots.txt, IP blocking, legal TOS language, and technical content protection such as authentication walls and dynamic rendering provides a more complete protection posture than any single approach.

Mike Kutlu
Client-Side Security Consultant

Client-side security consultant at cside. 10+ years of experience implementing technology solutions for enterprises (previously at Oracle, Cloudflare, and Splunk). Now helping teams use client-side intelligence to catch & reduce fraud.

FAQ

Frequently Asked Questions

PerplexityBot is Perplexity's web crawler that indexes content for its AI search engine. When users query Perplexity, the AI-generated answers draw from content PerplexityBot has collected. In 2024, multiple publishers reported compliance issues where their robots.txt-restricted content appeared in Perplexity answers despite explicit bot blocking.

Add `User-agent: PerplexityBot` followed by `Disallow: /` to your `robots.txt` file. Given the 2024 compliance controversy, supplement this with IP-level blocking using Perplexity's published IP ranges from their crawler documentation. Treat `robots.txt` as a signal of intent rather than a hard technical control for this specific crawler.

Multiple publishers reported in 2024 that Perplexity was surfacing detailed summaries of their content in AI search results despite robots.txt blocks on PerplexityBot. Perplexity disputed aspects of these reports. The episode was documented in coverage from Wired, The Atlantic, and other outlets, and it established that PerplexityBot's compliance is more actively contested than most other major AI crawlers.

PerplexityBot is an indexing crawler with a declared user-agent. Perplexity Shopper is a transacting agent that uses a real browser session and presents a standard Chrome user-agent. Blocking PerplexityBot has no effect on Perplexity Shopper. Shopper sessions require browser-layer behavioural detection to identify and control.

Legal strategies vary by jurisdiction and the type of content involved. Adding explicit terms-of-service language that prohibits AI training data collection and AI search summarisation creates a legal basis for enforcement that supplements technical blocking. Publishers have pursued both TOS-based and copyright-based legal arguments in the 2024 to 2025 period. This is an active legal area and specific guidance depends on jurisdiction and content type.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics
Related Articles
Book a demo