Skip to main content
Blog
Blog Attacks

How to Block Applebot-Extended on Your Website

Applebot-Extended is Apple's AI training crawler that feeds Apple Intelligence. Learn how it differs from Applebot and how to opt out via robots.txt.

Jun 23, 2026 6 min read
How to Block Applebot-Extended on Your Website

Apple operates two distinct web crawlers. The standard Applebot powers Siri, Spotlight Search, and Safari's content suggestions. It has existed for years and behaves like a conventional search engine crawler. Applebot-Extended is newer, introduced alongside Apple Intelligence, and collects web content specifically for AI model training and generative features.

Blocking standard Applebot affects your site's performance in Apple's search and discovery products. Blocking Applebot-Extended specifically opts you out of Apple's AI training pipeline without affecting standard Apple search features. The two require separate robots.txt rules. If you are working through the wider list of AI crawlers, the same approach applies to others such as Anthropic's ClaudeBot and Common Crawl's CCBot.


Standard Applebot vs. Applebot-Extended

Quick answer: Standard Applebot is Apple's search and discovery crawler. Applebot-Extended is Apple's AI training crawler, used to collect content for Apple Intelligence and foundational model development. They use different user-agent strings. Blocking one does not block the other.

CrawlerPurposeUser-agent
ApplebotSiri, Spotlight, Safari suggestions, search indexingApplebot/0.1
Applebot-ExtendedApple Intelligence AI training, generative featuresApplebot-Extended/0.1

This distinction matters because most site owners who want to block AI training data collection do not want to break their relationship with Apple's search and discovery features. Applebot-Extended blocking is surgical: it opts you out of AI training without removing your site from Siri suggestions, Spotlight search results, or Safari content features.


What Is Apple Intelligence and Why Does Applebot-Extended Feed It?

Quick answer: Apple Intelligence is Apple's AI system, announced at WWDC 2024, built into iOS 18, iPadOS 18, and macOS Sequoia. It powers writing assistance, image generation, Siri improvements, and generative features across Apple's device ecosystem. Applebot-Extended collects web content that trains and improves these AI capabilities.

Apple Intelligence runs on-device for many features and uses Apple's server infrastructure for more complex tasks. The models powering these features require training data from the web, which is what Applebot-Extended collects. As Apple expands Apple Intelligence capabilities (more Siri depth, better writing suggestions, richer generative features) Applebot-Extended's crawl activity is likely to grow.


How to Block Applebot-Extended (Without Blocking Standard Applebot)

Quick answer: Use separate robots.txt entries for Applebot-Extended and Applebot. A Disallow: / under Applebot-Extended blocks AI training collection. Leaving Applebot unrestricted preserves your site's presence in Siri, Spotlight, and Safari features.

To block Applebot-Extended while keeping standard Applebot access:

User-agent: Applebot-Extended
Disallow: /

User-agent: Applebot
Allow: /

Or with path-level restrictions on standard Applebot:

User-agent: Applebot-Extended
Disallow: /

User-agent: Applebot
Disallow: /account/
Disallow: /checkout/
Allow: /

Apple documents this process in its official Applebot documentation. The documentation explicitly describes Applebot-Extended and provides the opt-out mechanism.


How to Block Both Applebot Variants

Quick answer: If you want to restrict all Apple automated access, both standard search and AI training, add both user-agents to your robots.txt. This removes your site from Siri suggestions and Spotlight results as well as Apple Intelligence training.

User-agent: Applebot-Extended
Disallow: /

User-agent: Applebot
Disallow: /

Most site owners opt out of Applebot-Extended only. Blocking standard Applebot is a significant decision that reduces Apple device discoverability for your content. It is worth scoping the blocking to the specific crawler that raises data concerns.


Why You Might Block Applebot-Extended

Quick answer: The reasons to block Applebot-Extended are similar to reasons to block other AI training crawlers: proprietary content, licensed material, IP concerns, or explicit organizational policy on AI training data. The case for blocking is somewhat simpler because you can do it without breaking Apple search features.

Specific reasons organizations block Applebot-Extended:

  • Licensed content: Publishers with content licensed for specific uses cannot legally allow that content into AI training pipelines without separate authorization
  • Competitive content: Companies with proprietary pricing, product, or research data do not want that data in Apple's AI training corpus
  • Policy compliance: Organizations with explicit data governance policies that restrict AI training data collection
  • Control preference: A general preference for opting out of AI training data programs before the full implications of inclusion are understood

The opt-out mechanism Apple provides is cleaner than what most AI crawlers offer: separate user-agents with documented behaviour, explicit Apple documentation, and a compliance record that aligns with Apple's broader approach to developer and publisher relations. The same robots.txt pattern extends to the broader problem of blocking AI content scrapers across every declared crawler that respects the standard.


Browser-Layer Detection: What Applebot-Extended Blocking Doesn't Cover

Quick answer: Blocking Applebot-Extended controls Apple's training data pipeline. It does not control any future Apple Intelligence agentic products that browse your site on users' behalf, or any other undeclared AI agent operating in a real browser session. Those require browser-layer detection.

Apple's current focus with Apple Intelligence is on-device processing and AI-assisted features. But the direction of AI development is toward agentic products that browse and transact on behalf of users. If Apple builds or enables agents that complete tasks through real browser sessions, those sessions will not carry the Applebot-Extended user-agent and will not be affected by your robots.txt block.

cside operates inside the browser session and surfaces the behavioural signals that distinguish machine-executed sessions from human browsing: interaction timing, navigation linearity, fingerprint characteristics, and JavaScript execution patterns. In cside's controlled testing, traditional tools missed AI agents operating inside real browser sessions in 81 out of 100 scenarios. For organisations that want coverage of both declared crawlers and undeclared browser agents, robots.txt and browser-layer monitoring together provide the full posture.

cside AI agent detection dashboard

Consider what an Apple Intelligence agentic task looks like at the browser layer. A user on an iPhone asks Siri to compare subscription plans across two SaaS providers and recommend the cheaper annual option. Siri delegates to an agent that opens a WebKit session, navigates each pricing page, and extracts table data. The request arrives with a standard Safari user-agent and a legitimate iOS device fingerprint. There is no Applebot-Extended header because this is not a training crawl, it is an agentic product session. The agent completes both pricing pages in under 20 seconds, scrolls programmatically to the pricing section without any exploratory browsing, and submits no form interactions. Those behavioural signals (narrow scroll path, zero dwell variance, no return navigation) are invisible at the network layer and only surfaced by instrumentation running inside the browser session. For a deeper look at how agentic sessions evade robots.txt entirely, see our guide to blocking AI agent content scraping bots.

Mike Kutlu
Client-Side Security Consultant

Client-side security consultant at cside. 10+ years of experience implementing technology solutions for enterprises (previously at Oracle, Cloudflare, and Splunk). Now helping teams use client-side intelligence to catch & reduce fraud.

FAQ

Frequently Asked Questions

Applebot-Extended is Apple's AI training crawler, introduced for Apple Intelligence. Standard Applebot is Apple's search and discovery crawler used for Siri, Spotlight, and Safari content features. They use different user-agents and serve different purposes. Blocking Applebot-Extended opts you out of AI training without affecting Apple's standard search and discovery features.

Add `User-agent: Applebot-Extended` followed by `Disallow: /` to your robots.txt file. Leave standard Applebot either unrestricted or with only the path-level restrictions you want. Apple's documentation describes the process and confirms that the two crawlers are controlled independently.

Apple documents the Applebot-Extended opt-out mechanism explicitly and states that it respects robots.txt directives for this crawler. Apple's compliance track record for its crawlers is generally considered strong, consistent with its broader positioning around privacy and publisher relations. The separate user-agent strings make targeted blocking reliable and independently verifiable.

Apple Intelligence is Apple's AI system built into iOS 18, iPadOS 18, and macOS Sequoia, announced at WWDC 2024. It powers writing assistance, image generation, Siri improvements, and generative features across Apple devices. Web content collected by Applebot-Extended trains and improves these AI capabilities.

No. Blocking Applebot-Extended only affects Apple's AI training crawler. Standard Applebot, which powers Siri, Spotlight, and Safari suggestions, continues to operate unless you separately block the Applebot user-agent. The two crawlers are independent systems with separate robots.txt controls.

Monitor and Secure Your Third-Party Scripts

Gain full visibility and control over every script delivered to your users to enhance site security and performance.

Start free, or try Business with a 14-day trial.

cside dashboard interface showing script monitoring and security analytics
Related Articles
Book a demo