Skip to content
AI Training · xAI

How to Block xAI-Bot

xAI-Bot is Elon Musk's crawler for training Grok — the AI embedded inside X (formerly Twitter). It launched in 2024 and actively targets news, commentary, and real-time content. Here's how to opt out.

✓ Respects robots.txt
xAI-Bot reliably honors Disallow directives — robots.txt is sufficient for most publishers
Real-time focus
Grok prioritizes news and current events — makes it especially active on publisher sites
X/Twitter pipeline
Your content shared on X can reach Grok training via the social feed — robots.txt only covers web crawls

What Does xAI-Bot Collect?

xAI-Bot crawls publicly available web pages to build training data for Grok, xAI's AI assistant embedded across X (formerly Twitter) and available via grok.com. It launched publicly in early 2024 and quickly became one of the faster-growing AI training crawlers.

Unlike other AI companies whose assistants are general-purpose, Grok is positioned as a real-time knowledge engine — its core differentiator is knowing about recent events. This makes xAI-Bot unusually aggressive on news sites, commentary platforms, industry blogs, and any content that covers current topics. If your site publishes timely content, xAI-Bot has almost certainly visited it.

xAI draws from two data sources: direct X/Twitter post data (which it has privileged access to as the parent company) and external web crawling via xAI-Bot. Blocking xAI-Bot addresses the web crawl channel. Content that users post about your site on X is a separate pipeline that robots.txt cannot control.

xAI-Bot user agent
Mozilla/5.0 (compatible; xAI-Bot/1.0; +https://x.ai/grok)

In robots.txt, use the token xAI-Bot — no need to match the full UA string.

Option 1: Block via robots.txt (Recommended)

Block entire siteRecommended
robots.txt
User-agent: xAI-Bot
Disallow: /

One rule is all you need — xAI-Bot only uses the single xAI-Bot user agent token.

Block specific paths (protect premium or time-sensitive content)
robots.txt
# Block xAI-Bot from articles and premium content
User-agent: xAI-Bot
Disallow: /articles/
Disallow: /news/
Disallow: /premium/
Disallow: /members/
Block all major AI training crawlers at once
robots.txt
# Block all major AI training crawlers
User-agent: xAI-Bot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Normal search indexing — leave these alone
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

The full block list. Does not affect search ranking — Googlebot and Bingbot are explicitly allowed.

Option 2: Next.js App Router

Generate your robots.txt programmatically from app/robots.ts:

app/robots.ts
import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: 'xAI-Bot', disallow: ['/'] },
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'ClaudeBot', disallow: ['/'] },
      { userAgent: 'anthropic-ai', disallow: ['/'] },
      { userAgent: 'Google-Extended', disallow: ['/'] },
      { userAgent: 'PerplexityBot', disallow: ['/'] },
      { userAgent: 'CCBot', disallow: ['/'] },
      { userAgent: 'Bytespider', disallow: ['/'] },
      { userAgent: 'Googlebot', allow: ['/'] },
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: nginx — Hard 403 Block

Since xAI-Bot reliably respects robots.txt, a server-level block is optional. Use it if you want guaranteed enforcement regardless of robots.txt, or to reduce server load from crawler requests entirely.

nginx.conf
# In your server {} block
if ($http_user_agent ~* "xAI-Bot") {
    return 403;
}

Returns HTTP 403 before the request reaches your application. Combine with robots.txt for belt-and-suspenders protection.

Option 4: Cloudflare WAF Rule

If your site is behind Cloudflare, create a custom WAF rule to block xAI-Bot at the network edge:

Cloudflare WAF → Custom Rules → Expression
(http.user_agent contains "xAI-Bot")

Set the action to Block. This drops requests before they hit your server — zero load, zero logging overhead.

Navigate to: Cloudflare Dashboard → Security → WAF → Custom Rules → Create rule

Verify Your Block

To confirm xAI-Bot is seeing your robots.txt correctly, check your access logs for its user agent string:

bash
# Check nginx access logs for xAI-Bot
grep "xAI-Bot" /var/log/nginx/access.log | tail -20

# Check if it's fetching robots.txt (good sign)
grep "xAI-Bot" /var/log/nginx/access.log | grep "robots.txt"

# Confirm blocked requests (nginx hard block)
grep "xAI-Bot" /var/log/nginx/access.log | grep " 403 "

Seeing xAI-Bot fetch /robots.txt followed by no requests to your content means the block is working. If you see xAI-Bot on content pages after adding the robots.txt rule, add a server-level block as a backup.

Should You Block xAI-Bot?

The right answer depends on your goals. Here's the honest tradeoff:

Block if…
  • You publish original analysis or journalism and don't want Grok summarizing it for free
  • You have paid content or member-only research
  • Your content is time-sensitive and being reproduced diminishes its value
  • You're already blocking other AI training crawlers for consistency
Allow if…
  • You want Grok to know about your products, services, or brand
  • Your audience is heavy X/Twitter users who might discover you via Grok
  • Your content is designed for wide distribution and AI reach is additive
  • You're in a B2C space where Grok visibility could drive real referral traffic
The X/Twitter pipeline caveat

Blocking xAI-Bot stops web crawls but not the X/Twitter social pipeline. If your content gets shared, quoted, or discussed on X, that data flows into Grok's training through xAI's privileged access to the platform. This is a separate channel that robots.txt cannot control.

Frequently Asked Questions

Does xAI-Bot respect robots.txt?
xAI has stated that xAI-Bot respects robots.txt. Independent testing since its 2024 launch has generally confirmed this — making a robots.txt Disallow directive sufficient for most publishers. For high-value content, pairing it with a server-level block adds certainty.
What user agent does xAI-Bot use?
xAI-Bot's full user agent string is: Mozilla/5.0 (compatible; xAI-Bot/1.0; +https://x.ai/grok). In robots.txt, use the token xAI-Bot — no need to match the full string. This token has been stable since xAI-Bot's public launch in February 2024.
Will blocking xAI-Bot affect my search rankings?
No. xAI-Bot is a training crawler, not a search indexer. Blocking it has zero effect on Google, Bing, or any other search engine's crawling of your site. Your SEO rankings are completely unaffected.
Does xAI have a content removal form?
As of 2026, xAI does not have a widely-documented public content removal request process. The robots.txt opt-out is the primary mechanism. For legal removal requests, xAI's legal team at x.ai is the appropriate contact.
Is xAI-Bot the same as Grok browsing the web?
No. xAI-Bot is a background training crawler. Grok's real-time web search (when a user asks Grok to search the web) uses a different agent. Blocking xAI-Bot stops background training crawls but not Grok's live search fetches.
My X/Twitter content is already public — does blocking xAI-Bot matter?
Blocking xAI-Bot stops crawls of your website specifically. Content posted on X or content others share about you on X can still reach Grok through xAI's direct platform data access — that's a separate pipeline robots.txt cannot block. The web crawl block only covers direct crawls of your domain.

Related Guides

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Scan My Site Free →

Related Guides