How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Why does Grok focus on news and real-time content?

Grok is designed to have real-time knowledge as a differentiator over other AI assistants. xAI builds this through two channels: a direct feed of X (Twitter) posts and discussions, and web crawling via xAI-Bot for external sites. News sites, commentary platforms, and any site producing timely content are particularly targeted by xAI-Bot because of this real-time focus.

Will blocking xAI-Bot prevent my content from appearing in Grok answers?

Blocking xAI-Bot stops future crawls and prevents new content from entering Grok's training pipeline. Content that has already been crawled may still be present in currently deployed Grok models — training weights can't be retroactively cleared once a model is deployed. As xAI trains new Grok versions, blocked content will gradually be excluded from future training runs.

Does xAI have a content removal request process?

As of 2026, xAI does not have a widely-documented public content removal request form comparable to Anthropic's privacy.anthropic.com form. The robots.txt opt-out is the primary mechanism available. For legal removal requests, xAI's general legal contact at x.ai is the appropriate route. This may change as AI content regulation evolves.

Open Shadow ← All Guides

AI Training · xAI

How to Block xAI-Bot

xAI-Bot is Elon Musk's crawler for training Grok — the AI embedded inside X (formerly Twitter). It launched in 2024 and actively targets news, commentary, and real-time content. Here's how to opt out.

✓ Respects robots.txt

xAI-Bot reliably honors Disallow directives — robots.txt is sufficient for most publishers

Real-time focus

Grok prioritizes news and current events — makes it especially active on publisher sites

X/Twitter pipeline

Your content shared on X can reach Grok training via the social feed — robots.txt only covers web crawls

What Does xAI-Bot Collect?

xAI-Bot crawls publicly available web pages to build training data for Grok, xAI's AI assistant embedded across X (formerly Twitter) and available via grok.com. It launched publicly in early 2024 and quickly became one of the faster-growing AI training crawlers.

Unlike other AI companies whose assistants are general-purpose, Grok is positioned as a real-time knowledge engine — its core differentiator is knowing about recent events. This makes xAI-Bot unusually aggressive on news sites, commentary platforms, industry blogs, and any content that covers current topics. If your site publishes timely content, xAI-Bot has almost certainly visited it.

xAI draws from two data sources: direct X/Twitter post data (which it has privileged access to as the parent company) and external web crawling via xAI-Bot. Blocking xAI-Bot addresses the web crawl channel. Content that users post about your site on X is a separate pipeline that robots.txt cannot control.

xAI-Bot user agent

Mozilla/5.0 (compatible; xAI-Bot/1.0; +https://x.ai/grok)

In robots.txt, use the token xAI-Bot — no need to match the full UA string.

Option 1: Block via `robots.txt` (Recommended)

Block entire siteRecommended

robots.txt

User-agent: xAI-Bot
Disallow: /

One rule is all you need — xAI-Bot only uses the single xAI-Bot user agent token.

Block specific paths (protect premium or time-sensitive content)

robots.txt

# Block xAI-Bot from articles and premium content
User-agent: xAI-Bot
Disallow: /articles/
Disallow: /news/
Disallow: /premium/
Disallow: /members/

Block all major AI training crawlers at once

robots.txt

# Block all major AI training crawlers
User-agent: xAI-Bot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Normal search indexing — leave these alone
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

The full block list. Does not affect search ranking — Googlebot and Bingbot are explicitly allowed.

Option 2: Next.js App Router

Generate your robots.txt programmatically from app/robots.ts:

app/robots.ts

import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: 'xAI-Bot', disallow: ['/'] },
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'ClaudeBot', disallow: ['/'] },
      { userAgent: 'anthropic-ai', disallow: ['/'] },
      { userAgent: 'Google-Extended', disallow: ['/'] },
      { userAgent: 'PerplexityBot', disallow: ['/'] },
      { userAgent: 'CCBot', disallow: ['/'] },
      { userAgent: 'Bytespider', disallow: ['/'] },
      { userAgent: 'Googlebot', allow: ['/'] },
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: nginx — Hard 403 Block

Since xAI-Bot reliably respects robots.txt, a server-level block is optional. Use it if you want guaranteed enforcement regardless of robots.txt, or to reduce server load from crawler requests entirely.

nginx.conf

# In your server {} block
if ($http_user_agent ~* "xAI-Bot") {
    return 403;
}

Returns HTTP 403 before the request reaches your application. Combine with robots.txt for belt-and-suspenders protection.

Option 4: Cloudflare WAF Rule

If your site is behind Cloudflare, create a custom WAF rule to block xAI-Bot at the network edge:

Cloudflare WAF → Custom Rules → Expression

(http.user_agent contains "xAI-Bot")

Set the action to Block. This drops requests before they hit your server — zero load, zero logging overhead.

Navigate to: Cloudflare Dashboard → Security → WAF → Custom Rules → Create rule

Verify Your Block

To confirm xAI-Bot is seeing your robots.txt correctly, check your access logs for its user agent string:

bash

# Check nginx access logs for xAI-Bot
grep "xAI-Bot" /var/log/nginx/access.log | tail -20

# Check if it's fetching robots.txt (good sign)
grep "xAI-Bot" /var/log/nginx/access.log | grep "robots.txt"

# Confirm blocked requests (nginx hard block)
grep "xAI-Bot" /var/log/nginx/access.log | grep " 403 "

Seeing xAI-Bot fetch /robots.txt followed by no requests to your content means the block is working. If you see xAI-Bot on content pages after adding the robots.txt rule, add a server-level block as a backup.

Should You Block xAI-Bot?

The right answer depends on your goals. Here's the honest tradeoff:

Block if…

→ You publish original analysis or journalism and don't want Grok summarizing it for free
→ You have paid content or member-only research
→ Your content is time-sensitive and being reproduced diminishes its value
→ You're already blocking other AI training crawlers for consistency

Allow if…

→ You want Grok to know about your products, services, or brand
→ Your audience is heavy X/Twitter users who might discover you via Grok
→ Your content is designed for wide distribution and AI reach is additive
→ You're in a B2C space where Grok visibility could drive real referral traffic

The X/Twitter pipeline caveat

Blocking xAI-Bot stops web crawls but not the X/Twitter social pipeline. If your content gets shared, quoted, or discussed on X, that data flows into Grok's training through xAI's privileged access to the platform. This is a separate channel that robots.txt cannot control.

Frequently Asked Questions

Does xAI-Bot respect robots.txt?

xAI has stated that xAI-Bot respects robots.txt. Independent testing since its 2024 launch has generally confirmed this — making a robots.txt Disallow directive sufficient for most publishers. For high-value content, pairing it with a server-level block adds certainty.

What user agent does xAI-Bot use?

xAI-Bot's full user agent string is: Mozilla/5.0 (compatible; xAI-Bot/1.0; +https://x.ai/grok). In robots.txt, use the token xAI-Bot — no need to match the full string. This token has been stable since xAI-Bot's public launch in February 2024.

Will blocking xAI-Bot affect my search rankings?

No. xAI-Bot is a training crawler, not a search indexer. Blocking it has zero effect on Google, Bing, or any other search engine's crawling of your site. Your SEO rankings are completely unaffected.

Does xAI have a content removal form?

As of 2026, xAI does not have a widely-documented public content removal request process. The robots.txt opt-out is the primary mechanism. For legal removal requests, xAI's legal team at x.ai is the appropriate contact.

Is xAI-Bot the same as Grok browsing the web?

No. xAI-Bot is a background training crawler. Grok's real-time web search (when a user asks Grok to search the web) uses a different agent. Blocking xAI-Bot stops background training crawls but not Grok's live search fetches.

My X/Twitter content is already public — does blocking xAI-Bot matter?

Blocking xAI-Bot stops crawls of your website specifically. Content posted on X or content others share about you on X can still reach Grok through xAI's direct platform data access — that's a separate pipeline robots.txt cannot block. The web crawl block only covers direct crawls of your domain.