Skip to content
AI Search Crawler · You.com

How to Block YouBot

YouBot is You.com's AI search crawler — it indexes your site so the You.com AI assistant can cite you in search results. Blocking it isn't just an AI training decision: it's a search visibility tradeoff.

🔍 AI Search Crawler
Powers You.com search results and AI assistant citations — like Googlebot for an AI-native search engine
⚠ Visibility tradeoff
Blocking removes you from You.com AI answers and search — not just opt-out of training
Not AI training
YouBot collects for search indexing, not LLM training. Different decision than GPTBot or ClaudeBot

What Is YouBot?

YouBot is the web crawler operated by You.com — an AI-native search engine founded in 2021 that has grown into one of the more significant players in the AI search space. You.com competes directly with Perplexity AI and ChatGPT Search: it answers queries with AI-generated responses that cite web sources inline, alongside a traditional search results view.

YouBot's role is analogous to Googlebot: it crawls the web to build and maintain the search index that powers You.com AI answers. When a You.com user asks a question, the AI assistant draws on content YouBot has indexed — and typically displays your site as a clickable citation in its answer. If your site isn't in You.com's index, it won't appear in those answers.

Key distinction from AI training crawlers

YouBot is not collecting training data for an LLM. You.com doesn't train foundation models — it builds an AI-powered search product on top of existing models. YouBot's purpose is search indexing and citation retrieval, not building training datasets. The decision to block it is a search visibility decision, not a training data privacy decision.

YouBot user agent token
YouBot

Use this exact casing in robots.txt.

Should You Block YouBot?

Block if:
You don't want your content appearing in You.com AI answers. You have exclusive content arrangements that restrict syndication. You want to control which AI-native search engines surface your work. You have no meaningful traffic from You.com and see no upside.
Don't block if:
You want to be cited in You.com AI answers (referral traffic + brand visibility). You publish content that benefits from AI search discovery. You're already allowing PerplexityBot — the use case is nearly identical. You want visibility across all major AI search engines.
The Perplexity parallel

The YouBot decision is nearly identical to the PerplexityBot decision. Both are AI-native search crawlers. Both cite sources. Both represent a growing segment of AI search users. If you've thought through PerplexityBot and decided to allow or block it, apply the same logic to YouBot.

Option 1: Block via robots.txt

Block YouBot onlyStart here
robots.txt
User-agent: YouBot
Disallow: /
Block all major AI search crawlers
robots.txt
# Block AI search crawlers (impacts search visibility in these engines)
User-agent: YouBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

# Keep traditional search — these don't train AI on your content
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

This removes you from You.com, Perplexity, and ChatGPT Search results. High impact on AI search visibility.

Block AI training — allow AI search (common pattern)Recommended for most sites
robots.txt
# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow AI search crawlers — these cite your content and send traffic
User-agent: YouBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

# Allow traditional search
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Blocks training data collection while maximising AI search visibility. Most publishers benefit from this approach.

Option 2: Next.js App Router

app/robots.ts
import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      // Block AI training crawlers
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'ClaudeBot', disallow: ['/'] },
      { userAgent: 'anthropic-ai', disallow: ['/'] },
      { userAgent: 'Google-Extended', disallow: ['/'] },
      { userAgent: 'CCBot', disallow: ['/'] },
      { userAgent: 'Amazonbot', disallow: ['/'] },

      // Block YouBot if you don't want You.com search visibility
      { userAgent: 'YouBot', disallow: ['/'] },

      // Allow traditional search + AI search (comment out YouBot above to allow)
      { userAgent: 'Googlebot', allow: ['/'] },
      { userAgent: 'PerplexityBot', allow: ['/'] },
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: nginx — Hard Block

nginx.conf
# Hard 403 block — doesn't depend on robots.txt compliance
if ($http_user_agent ~* "YouBot") {
    return 403;
}

Option 4: Cloudflare WAF Rule

Cloudflare WAF → Custom Rules → Expression
(http.user_agent contains "YouBot")

Set action to Block. Cloudflare Dashboard → Security → WAF → Custom Rules.

The AI Search Landscape: YouBot in Context

YouBot sits alongside a growing cohort of AI-native search crawlers that are changing how people discover content. Understanding the ecosystem helps you make the right allow/block decisions:

BotCompanyPurposeBlock impact
YouBotYou.comAI search indexingLose You.com visibility
PerplexityBotPerplexity AIAI search indexingLose Perplexity visibility
OAI-SearchBotOpenAIChatGPT Search indexingLose ChatGPT Search visibility
GPTBotOpenAILLM training dataNo search impact
ClaudeBotAnthropicLLM training dataNo search impact

Blocking AI search crawlers (YouBot, PerplexityBot, OAI-SearchBot) removes you from those engines' results. Blocking AI training crawlers (GPTBot, ClaudeBot, CCBot) has no search impact.

Frequently Asked Questions

What is YouBot and what does it do?
YouBot is You.com's AI search crawler. It indexes your site so the You.com AI assistant can answer questions and cite your content as a source. It works like Googlebot for an AI-native search engine — not like GPTBot for LLM training.
Does blocking YouBot affect AI model training?
No. YouBot is a search indexing crawler, not an AI training crawler. Blocking it removes you from You.com search results — it has no effect on any AI training pipeline. To opt out of training, block GPTBot, ClaudeBot, CCBot, and similar training crawlers.
Should I block YouBot?
Only if you don't want visibility in You.com AI search results. If you allow PerplexityBot (the closest equivalent), there's usually no reason to block YouBot. If you're opting out of all AI search engines, block both.
What user agent does YouBot use?
The token is YouBot — capital Y, capital B. Use exactly: User-agent: YouBot followed by Disallow: / in your robots.txt.
How does You.com compare to Perplexity?
Both are AI-native search engines that cite web sources. You.com has a longer history (founded 2021), a more traditional search results view alongside AI answers, and built-in AI apps for coding and writing. Perplexity tends to be more focused on pure AI answer delivery. From a crawler perspective, the mechanics are essentially identical.
Will blocking YouBot affect my Google rankings?
No. YouBot and Googlebot are completely separate. Blocking YouBot has zero effect on Google Search, Google Discover, or any other Google product.
Does YouBot respect robots.txt?
Yes. You.com operates YouBot as a well-behaved search crawler that respects robots.txt Disallow directives. You.com is a US-based company following standard industry conventions.

Related Guides

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Scan My Site Free →

Related Guides