Skip to content
Guides/robots.txt for AI Bots
Updated March 2026· 8 min read

robots.txt for AI Bots: The Complete 2026 Guide

How to control GPTBot, ClaudeBot, PerplexityBot, Bytespider, and 46+ other AI crawlers using your robots.txt file. Includes ready-to-use configurations, per-bot examples, and the most common mistakes to avoid.

Prefer a visual tool?
Build your robots.txt interactively with per-bot toggles and one-click presets.

What is robots.txt?

robots.txt is a plain text file placed at the root of your website (e.g. https://yoursite.com/robots.txt) that tells web crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol (REP), first established in 1994.

In 2024–2026, robots.txt became the primary mechanism for controlling AI crawler access. Every major AI company — OpenAI, Anthropic, Google, Perplexity, Meta — has published official documentation on how their bots respect robots.txt rules.

Important: robots.txt is a protocol, not a technical barrier. Reputable AI companies honour it. Some less-scrupulous crawlers may not. See bots that ignore robots.txt below.

Two Types of AI Bots — and Why It Matters

Before writing a single robots.txt rule, understand the difference. Blocking the wrong bots has real consequences.

AI Training Bots

Collect your content to train large language models. Your text, code, and writing may appear in future AI model outputs.

GPTBot (OpenAI)
ClaudeBot (Anthropic)
CCBot (Common Crawl)
Bytespider (ByteDance)
cohere-ai (Cohere)
HuggingFaceBot
✓ Blocking these has no SEO impact
AI Search Bots

Index your content so users can find it through AI-powered search engines. Blocking removes you from those results.

PerplexityBot
OAI-SearchBot (SearchGPT)
Google-Extended (Gemini)
DuckAssistBot
YouBot
BraveBot
⚠️ Blocking these removes you from AI search
The common mistake: Blocking all bots with User-agent: * / Disallow: / kills your Google ranking along with the AI crawlers. Always specify individual bot names or use the template below.

Ready-to-Use Configurations

Copy the configuration that matches your needs. Place the file at https://yoursite.com/robots.txt.

1. Block all AI training bots (recommended starting point)
Blocks GPTBot, ClaudeBot, CCBot, Bytespider, Cohere, and other training crawlers while keeping AI search bots (PerplexityBot, Bingbot) and Googlebot active.
Customize →
# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: HuggingFaceBot
Disallow: /

# Allow search engines (including AI-enhanced)
User-agent: *
Allow: /
2. Block ALL AI bots (maximum protection)
Blocks every known AI crawler — training and search. Your site will not appear in AI-powered search results (Perplexity, SearchGPT, etc.) or be used for training.
Customize →
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Gemini
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: HuggingFaceBot
Disallow: /

User-agent: Ai2Bot
Disallow: /

User-agent: Kangaroo Bot
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: *
Allow: /
3. Allow AI search, block AI training
The most common balanced approach. AI search engines (Perplexity, SearchGPT, Bingbot) can index your content, but training crawlers cannot use it to train new models.
Customize →
# Allow AI search engines
User-agent: PerplexityBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: YouBot
Allow: /

User-agent: DuckAssistBot
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: HuggingFaceBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: *
Allow: /
4. Block AI bots from specific directories only
Protect private content (member areas, drafts, API docs) while keeping your public pages indexable by AI crawlers.
Customize →
# Block AI crawlers from private areas
User-agent: GPTBot
Disallow: /private/
Disallow: /members/
Disallow: /drafts/
Disallow: /api/

User-agent: ClaudeBot
Disallow: /private/
Disallow: /members/
Disallow: /drafts/
Disallow: /api/

User-agent: PerplexityBot
Disallow: /private/
Disallow: /members/

# All other bots: standard rules
User-agent: *
Disallow: /private/
Disallow: /drafts/

Per-Bot Quick Reference

The exact User-agent string to use for each major AI bot. User-agent matching in robots.txt is case-insensitive but must match the bot's declared name exactly (no wildcards within the name).

BotOperatorUser-agent stringTypeRespects
GPTBotOpenAIGPTBotTraining✓ Yes
ChatGPT-UserOpenAIChatGPT-UserAssistant✓ Yes
OAI-SearchBotOpenAIOAI-SearchBotAI Search✓ Yes
ClaudeBotAnthropicClaudeBotTraining✓ Yes
PerplexityBotPerplexityPerplexityBotAI Search✓ Yes
Google-ExtendedGoogleGoogle-ExtendedTraining/Search✓ Yes
GeminiGoogleGeminiAI Search✓ Yes
BingbotMicrosoftbingbotSearch✓ Yes
CCBotCommon CrawlCCBotTraining✓ Yes
BytespiderByteDanceBytespiderTraining✗ No
cohere-aiCoherecohere-aiTraining✓ Yes
xAI-BotxAIxAI-BotTraining✓ Yes
MistralBotMistral AIMistralBotTraining✓ Yes
HuggingFaceBotHugging FaceHuggingFaceBotTraining✓ Yes
YouBotYou.comYouBotAI Search✓ Yes
DuckAssistBotDuckDuckGoDuckAssistBotAI Search✓ Yes

See the full AI Bot Directory for all 49 bots.

Bots That Ignore robots.txt

⚠️
Bytespider (ByteDance / TikTok)

Bytespider is operated by ByteDance, the parent company of TikTok. Multiple independent researchers have documented it ignoring Disallow rules. It has also been observed using disguised user-agent strings to bypass detection. robots.txt alone may not be sufficient — consider IP-level blocking via your server firewall or Cloudflare WAF rules.

All other major AI companies (OpenAI, Anthropic, Google, Perplexity, Cohere, Mistral, xAI, Hugging Face) have published official compliance statements. Their bots check robots.txt before crawling and honour Disallow rules.

robots.txt vs. Meta Tags vs. HTTP Headers

You have three complementary tools. robots.txt operates at the crawl level. Meta tags and HTTP headers give per-page control even if the crawler has already retrieved the page.

robots.txtSite-wide or per-directory

Checked before the bot fetches any page

User-agent: GPTBot
Disallow: /

Best for: Blanket rules for whole site or large sections

<meta name="robots">Per page (HTML only)

Found inside the <head> of an HTML page

<meta name="robots" content="noai, noimageai">

Best for: Page-level overrides, dynamic CMS pages

X-Robots-TagPer page or file (HTTP header)

Returned in the HTTP response header

X-Robots-Tag: noai, noimageai

Best for: PDFs, images, API responses — non-HTML resources

Check your current meta tags → AI Meta Tags Checker

5 Common robots.txt Mistakes

1
Blocking Googlebot

Never put Googlebot (or bingbot) in a Disallow rule when targeting AI bots. A wildcard User-agent: * Disallow: / will kill your entire search presence.

❌ Avoid:
User-agent: *
Disallow: /
2
Misspelling user-agent names

User-agent matching is case-insensitive but the name must match exactly. "GPT-Bot" and "gptbot" both work; "GPT Bot" (with a space) does not.

❌ Avoid:
User-agent: GPT-Bot  # wrong — should be GPTBot
Disallow: /
3
Only blocking one OpenAI bot

OpenAI has three crawlers: GPTBot (training), ChatGPT-User (browsing), and OAI-SearchBot (search). Block all three if that is your intent.

❌ Avoid:
User-agent: GPTBot  # ChatGPT-User and OAI-SearchBot still active
Disallow: /
4
Assuming robots.txt stops all scrapers

robots.txt only controls crawlers that choose to honour it. Malicious scrapers and some commercial crawlers ignore it entirely. For sensitive content, use server-level authentication.

5
Not testing after changes

Always validate your updated robots.txt with the Analyzer tool before deploying. A syntax error can accidentally block all crawlers.

Frequently Asked Questions

Do AI bots respect robots.txt?

Yes — all major AI companies (OpenAI, Anthropic, Google, Perplexity, Cohere, Mistral) officially honour robots.txt. The notable exception is Bytespider (ByteDance), which has been documented bypassing disallow rules. robots.txt is a protocol, not a technical barrier.

Will blocking AI bots hurt my SEO?

Blocking AI training bots (GPTBot, ClaudeBot, CCBot) has zero impact on traditional SEO rankings. Blocking AI search bots (PerplexityBot, OAI-SearchBot, Google-Extended) will remove your site from those AI search results — similar to blocking Googlebot from traditional search.

How quickly do AI bots pick up robots.txt changes?

Most AI crawlers recache robots.txt every 24 hours. Some may take up to a week to fully stop crawling newly-disallowed content. If you've already been indexed for training, robots.txt prevents future access — it doesn't retroactively remove previously collected content.

Is there a retroactive opt-out from AI training?

OpenAI and Google offer forms to request removal of content already used for training, but results vary. robots.txt prevents future collection; retroactive removal requires contacting each company individually.

Should I block all AI bots or just some?

The most balanced approach: block AI training bots (GPTBot, ClaudeBot, CCBot, Bytespider) to protect your content from training datasets, while allowing AI search bots (PerplexityBot, OAI-SearchBot, Google-Extended) so you stay visible in AI-powered search results.

Ready to configure your robots.txt?

Use the free generator to build a configuration with per-bot toggles, one-click presets, and instant copy-paste output.

Related Guides