Skip to content
AI Training · Anthropic

How to Block ClaudeBot

ClaudeBot is Anthropic's training crawler for Claude AI models. Here's how to opt out — plus what Anthropic actually collects, and how to request content removal.

✓ Respects robots.txt
Anthropic reliably honors Disallow directives — no server-level block required
Removal form available
Anthropic offers a content removal request form — rare among AI companies
2 user agent tokens
Block both ClaudeBot and anthropic-ai for full coverage

What Does ClaudeBot Collect?

ClaudeBot crawls publicly available web pages to build training datasets for Anthropic's Claude models. It focuses on text content — articles, documentation, blog posts, and other written material that improves Claude's factual knowledge, writing quality, and reasoning ability.

Anthropic began more aggressive web crawling in late 2023 as it scaled training for Claude 2, Claude 3, and subsequent model families. Unlike some AI companies that rely primarily on licensed datasets, Anthropic uses web crawl data as a significant component of its training pipeline.

Anthropic uses two user agent tokens that publishers should be aware of: ClaudeBot (the primary one) and anthropic-ai (used in some contexts). A complete block requires both.

Option 1: Block via robots.txt (Recommended)

Block entire site — both Anthropic user agentsRecommended
robots.txt
User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

Block both tokens — Anthropic has used anthropic-ai as an alternate identifier.

Block specific paths (protect premium content)
robots.txt
# Block ClaudeBot from original/paid content
User-agent: ClaudeBot
Disallow: /articles/
Disallow: /premium/
Disallow: /research/

User-agent: anthropic-ai
Disallow: /articles/
Disallow: /premium/
Disallow: /research/
Block all major AI training crawlers
robots.txt
# Block all AI training crawlers
User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

# Normal search indexing — unaffected
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Option 2: Next.js App Router

app/robots.ts
import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: 'ClaudeBot', disallow: ['/'] },
      { userAgent: 'anthropic-ai', disallow: ['/'] },
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'Google-Extended', disallow: ['/'] },
      { userAgent: 'Googlebot', allow: ['/'] },
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: Server-Level Block

Since Anthropic reliably respects robots.txt, server-level blocking is generally not needed. Use it only if you want hard 403 enforcement regardless of robots.txt.

nginx
if ($http_user_agent ~* "(ClaudeBot|anthropic-ai)") {
    return 403;
}
Cloudflare WAF Custom Rule
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "anthropic-ai")
→ Action: Block
Next.js Middleware
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const ANTHROPIC_BOTS = ['ClaudeBot', 'anthropic-ai'];

export function middleware(request: NextRequest) {
  const ua = request.headers.get('user-agent') ?? '';
  if (ANTHROPIC_BOTS.some(bot => ua.includes(bot))) {
    return new NextResponse('Forbidden', { status: 403 });
  }
  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};

Request Removal of Existing Content

Anthropic offers a content removal request form — most AI companies don't

If your content has already been crawled, you can request that Anthropic exclude it from future training runs via their privacy portal. This is forward-looking — it cannot remove content from models already trained.

1.Visit privacy.anthropic.com and submit a data removal request
2.Provide your domain or specific URLs you want excluded
3.Also add robots.txt rules — the removal form and robots.txt work independently

Verify Your Block

1. Check your live robots.txt
https://yoursite.com/robots.txt

Confirm both ClaudeBot and anthropic-ai appear with Disallow: /.

2. Simulate ClaudeBot with curl
curl -A "ClaudeBot" -I https://yoursite.com/robots.txt
# Expect 200 — then no further requests from ClaudeBot
3. Grep server logs
grep -i "claudebot|anthropic" /var/log/nginx/access.log | tail -20
# After block: only /robots.txt requests, nothing else
4. Use Open Shadow's robots.txt checker
→ Check your robots.txt now

ClaudeBot vs Claude.ai Browsing — Know the Difference

ClaudeBotClaude.ai browsing
Triggered byAnthropic's automated training pipelineA user asking Claude to visit a URL
PurposeBuilding training datasetsReal-time information retrieval
User agentClaudeBot / anthropic-aiVaries (often headless browser UA)
Blocked by robots.txt?Yes ✓Partially (behavior varies)
FrequencySystematic, periodic sweepsOn-demand, triggered by users

Frequently Asked Questions

Does Anthropic respect robots.txt for ClaudeBot?
Yes. Anthropic has publicly committed to respecting robots.txt Disallow directives for ClaudeBot. Unlike Bytespider, ClaudeBot reliably honors opt-out requests. Anthropic also provides a dedicated removal form at privacy.anthropic.com for content already crawled.
What is the difference between ClaudeBot and Claude.ai browsing?
ClaudeBot is Anthropic's background training crawler — it systematically indexes the web to build training datasets. Claude.ai's web browsing is different: it fetches pages in real time when a user asks Claude to visit a specific URL. Blocking ClaudeBot in robots.txt does not reliably prevent on-demand browsing.
Can I request removal of my content from Claude's training data?
Anthropic provides a removal request form at privacy.anthropic.com. You can submit URLs or domains for review. Note: removal affects future training runs but cannot remove content from models already trained and deployed.
Does blocking ClaudeBot affect my site's appearance in Claude's answers?
Not immediately. Claude's knowledge comes from training data already collected. Blocking ClaudeBot stops future crawls but doesn't erase existing knowledge. Over time, as new Claude versions are trained, blocked content will be progressively excluded.
What user agent does ClaudeBot use?
ClaudeBot's full user agent is: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +https://www.anthropic.com/bot). In robots.txt, use the token 'ClaudeBot' — you don't need the full string. Also block 'anthropic-ai' as a second token Anthropic has used.

Related Tools

See Every Crawler on Your Site

Free AI visibility check — see which training bots have access to your content and generate a custom robots.txt to block them.