How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Can I block GPTBot from certain pages but allow it on others?

Yes. robots.txt supports path-level rules. You can disallow GPTBot from premium content, paywalled sections, or original writing while allowing it to crawl marketing pages or documentation where AI citations might benefit you. Use specific Disallow paths rather than Disallow: / for granular control.

Open Shadow ← All Guides

AI Training · OpenAI

How to Block GPTBot

GPTBot is OpenAI's training crawler for GPT-4 and future models. Here's how to opt out — plus the critical difference between GPTBot, ChatGPT-User, and OAI-SearchBot.

✓ Respects robots.txt

OpenAI reliably honors Disallow directives for all three of its crawlers

3 separate agents

GPTBot, ChatGPT-User, and OAI-SearchBot need separate rules

No SEO impact

OpenAI crawlers are not connected to Google or Bing rankings

OpenAI Has Three Crawlers — Know the Difference

Before blocking, it's important to understand that OpenAI runs three distinct crawlers with different purposes. Blocking one does not block the others.

Bot	Purpose	Safe to block?	Tradeoff
GPTBot	AI model training (GPT-4+)	Yes ✓	Future GPT models won't train on your content
ChatGPT-User	Real-time browsing by ChatGPT users	Yes ✓	ChatGPT users can't fetch live content from your site
OAI-SearchBot	ChatGPT Search results	Consider carefully	Your site won't appear in ChatGPT Search answers

Option 1: Block via `robots.txt` (Recommended)

The robots.txt block is the standard OpenAI-endorsed opt-out method. Pick the level of blocking that fits your goals:

Block training only (allow ChatGPT Search)Most common

robots.txt

User-agent: GPTBot
Disallow: /

Blocks AI training. OAI-SearchBot and ChatGPT-User can still access your site.

Block all OpenAI crawlers

robots.txt

# Block all OpenAI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

Complete exclusion. Your content won't appear in any OpenAI product.

Block specific paths only

robots.txt

# Block GPTBot from premium/original content only
User-agent: GPTBot
Disallow: /articles/
Disallow: /premium/
Disallow: /research/
Allow: /

# Allow OAI-SearchBot for all paths (appear in ChatGPT Search)
User-agent: OAI-SearchBot
Allow: /

Block all major AI training crawlers at once

robots.txt

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

# Allow standard search indexing
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Option 2: Next.js App Router Config

app/robots.ts

import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      // Block training — allow ChatGPT Search
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'ChatGPT-User', disallow: ['/'] },
      // Allow ChatGPT Search to index your content
      { userAgent: 'OAI-SearchBot', allow: ['/'] },
      // Normal search bots
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: IP-Level Block (nginx / Cloudflare)

OpenAI publishes its crawler IP ranges at openai.com/gptbot-ranges.txt. These change periodically, so IP blocking requires maintenance. Use this as a supplement to robots.txt, not a replacement.

nginx — block by user agent (recommended over IP)

nginx.conf

# Block GPTBot and ChatGPT-User at nginx level
if ($http_user_agent ~* "(GPTBot|ChatGPT-User|OAI-SearchBot)") {
    return 403;
}

# Or return 404 to avoid any fingerprinting
if ($http_user_agent ~* "GPTBot") {
    return 404;
}

Cloudflare WAF rule

Cloudflare → Security → WAF → Custom Rules

Field:     User Agent
Operator:  contains
Value:     GPTBot
Action:    Block

# Add additional rule for ChatGPT-User if needed
Field:     User Agent
Operator:  contains
Value:     ChatGPT-User
Action:    Block

Next.js Middleware (user-agent blocking)

middleware.ts

import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const BLOCKED_BOTS = ['GPTBot', 'ChatGPT-User'];

export function middleware(request: NextRequest) {
  const ua = request.headers.get('user-agent') ?? '';
  
  if (BLOCKED_BOTS.some(bot => ua.includes(bot))) {
    return new NextResponse('Forbidden', { status: 403 });
  }
  
  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};

Note: Middleware blocking won't override a robots.txt Allow rule — it works independently at the HTTP layer.

Verify Your Block

1. Check your live robots.txt

https://yoursite.com/robots.txt

Confirm User-agent: GPTBot followed by Disallow: / appears.

2. Test with curl (simulate GPTBot)

# Simulate GPTBot fetching your homepage
curl -A "GPTBot" -I https://yoursite.com

# If blocked via nginx/middleware: expect 403 or 404
# If robots.txt only: expect 200 (robots.txt blocks crawling, not HTTP access)

3. Check server logs

grep -i "gptbot|chatgpt" /var/log/nginx/access.log | tail -20

After a robots.txt block: GPTBot should only hit /robots.txt then stop.

4. Use Open Shadow robots.txt checker

→ Check your robots.txt now

What Blocking GPTBot Actually Does (and Doesn't)

✓ What it DOES prevent

→ Your content being used in future GPT model training runs
→ OpenAI crawling new pages you publish
→ Bandwidth consumption from GPTBot crawl traffic
→ Your paywalled content feeding into AI datasets

✗ What it DOES NOT prevent

→ Content already crawled from being in existing model weights
→ ChatGPT-User browsing your site (separate bot)
→ ChatGPT citing your site from its existing knowledge
→ Other AI companies' bots (Google-Extended, ClaudeBot, etc.)

Should You Block GPTBot?

✓ Block it if you are:

→ A news publisher — original reporting is your product
→ Running a paid newsletter or subscription content
→ Concerned ChatGPT summarizes your articles instead of sending traffic
→ A creative writer — your voice and style have commercial value
→ Academic or research — IP concerns around training datasets

Consider allowing if you are:

→ Running open-source docs — AI training amplifies reach
→ A brand wanting AI citations and mentions for discovery
→ Building a content moat through sheer volume
→ Wanting to appear in ChatGPT Search answers (keep OAI-SearchBot at minimum)

Frequently Asked Questions

Does blocking GPTBot affect ChatGPT's ability to browse my site?▼

Blocking GPTBot only prevents OpenAI's training crawler. It does NOT prevent ChatGPT-User (the browsing agent used when ChatGPT users request a live page fetch) from accessing your site. If you want to block both training AND real-time browsing, you need to disallow both GPTBot and ChatGPT-User in robots.txt.

Does OpenAI respect robots.txt?▼

Yes. OpenAI has publicly committed to respecting robots.txt for GPTBot, ChatGPT-User, and OAI-SearchBot. Independent testing has confirmed this. Unlike Bytespider, which has been documented ignoring Disallow directives, GPTBot reliably backs off from disallowed paths.

What is the difference between GPTBot, ChatGPT-User, and OAI-SearchBot?▼

GPTBot is OpenAI's training crawler for future GPT models. ChatGPT-User fetches pages when a ChatGPT user asks it to browse a URL in real time. OAI-SearchBot powers ChatGPT Search results. All three respect robots.txt and have separate user agent tokens so you can control them independently.

Will blocking GPTBot remove my site from ChatGPT's knowledge base?▼

No. Content already crawled and used in training remains in existing model weights. Blocking GPTBot prevents future crawls from including your content in new training runs, but doesn't retroactively remove anything from deployed models.

Should I block OAI-SearchBot if I block GPTBot?▼

That depends on your goals. OAI-SearchBot powers ChatGPT Search — blocking it means your content won't appear when users search via ChatGPT. If you want traffic from ChatGPT Search, allow OAI-SearchBot while blocking GPTBot.