xAI-Bot is Elon Musk's crawler for training Grok — the AI embedded inside X (formerly Twitter). It launched in 2024 and actively targets news, commentary, and real-time content. Here's how to opt out.
xAI-Bot crawls publicly available web pages to build training data for Grok, xAI's AI assistant embedded across X (formerly Twitter) and available via grok.com. It launched publicly in early 2024 and quickly became one of the faster-growing AI training crawlers.
Unlike other AI companies whose assistants are general-purpose, Grok is positioned as a real-time knowledge engine — its core differentiator is knowing about recent events. This makes xAI-Bot unusually aggressive on news sites, commentary platforms, industry blogs, and any content that covers current topics. If your site publishes timely content, xAI-Bot has almost certainly visited it.
xAI draws from two data sources: direct X/Twitter post data (which it has privileged access to as the parent company) and external web crawling via xAI-Bot. Blocking xAI-Bot addresses the web crawl channel. Content that users post about your site on X is a separate pipeline that robots.txt cannot control.
Mozilla/5.0 (compatible; xAI-Bot/1.0; +https://x.ai/grok)In robots.txt, use the token xAI-Bot — no need to match the full UA string.
robots.txt (Recommended)User-agent: xAI-Bot Disallow: /
One rule is all you need — xAI-Bot only uses the single xAI-Bot user agent token.
# Block xAI-Bot from articles and premium content User-agent: xAI-Bot Disallow: /articles/ Disallow: /news/ Disallow: /premium/ Disallow: /members/
# Block all major AI training crawlers User-agent: xAI-Bot Disallow: / User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Applebot-Extended Disallow: / # Normal search indexing — leave these alone User-agent: Googlebot Allow: / User-agent: Bingbot Allow: /
The full block list. Does not affect search ranking — Googlebot and Bingbot are explicitly allowed.
Generate your robots.txt programmatically from app/robots.ts:
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{ userAgent: 'xAI-Bot', disallow: ['/'] },
{ userAgent: 'GPTBot', disallow: ['/'] },
{ userAgent: 'ClaudeBot', disallow: ['/'] },
{ userAgent: 'anthropic-ai', disallow: ['/'] },
{ userAgent: 'Google-Extended', disallow: ['/'] },
{ userAgent: 'PerplexityBot', disallow: ['/'] },
{ userAgent: 'CCBot', disallow: ['/'] },
{ userAgent: 'Bytespider', disallow: ['/'] },
{ userAgent: 'Googlebot', allow: ['/'] },
{ userAgent: '*', allow: ['/'] },
],
sitemap: 'https://yoursite.com/sitemap.xml',
};
}Since xAI-Bot reliably respects robots.txt, a server-level block is optional. Use it if you want guaranteed enforcement regardless of robots.txt, or to reduce server load from crawler requests entirely.
# In your server {} block
if ($http_user_agent ~* "xAI-Bot") {
return 403;
}Returns HTTP 403 before the request reaches your application. Combine with robots.txt for belt-and-suspenders protection.
If your site is behind Cloudflare, create a custom WAF rule to block xAI-Bot at the network edge:
(http.user_agent contains "xAI-Bot")
Set the action to Block. This drops requests before they hit your server — zero load, zero logging overhead.
Navigate to: Cloudflare Dashboard → Security → WAF → Custom Rules → Create rule
To confirm xAI-Bot is seeing your robots.txt correctly, check your access logs for its user agent string:
# Check nginx access logs for xAI-Bot grep "xAI-Bot" /var/log/nginx/access.log | tail -20 # Check if it's fetching robots.txt (good sign) grep "xAI-Bot" /var/log/nginx/access.log | grep "robots.txt" # Confirm blocked requests (nginx hard block) grep "xAI-Bot" /var/log/nginx/access.log | grep " 403 "
Seeing xAI-Bot fetch /robots.txt followed by no requests to your content means the block is working. If you see xAI-Bot on content pages after adding the robots.txt rule, add a server-level block as a backup.
The right answer depends on your goals. Here's the honest tradeoff:
Blocking xAI-Bot stops web crawls but not the X/Twitter social pipeline. If your content gets shared, quoted, or discussed on X, that data flows into Grok's training through xAI's privileged access to the platform. This is a separate channel that robots.txt cannot control.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →