Skip to content
AI Training · OpenAI

How to Block GPTBot

GPTBot is OpenAI's training crawler for GPT-4 and future models. Here's how to opt out — plus the critical difference between GPTBot, ChatGPT-User, and OAI-SearchBot.

✓ Respects robots.txt
OpenAI reliably honors Disallow directives for all three of its crawlers
3 separate agents
GPTBot, ChatGPT-User, and OAI-SearchBot need separate rules
No SEO impact
OpenAI crawlers are not connected to Google or Bing rankings

OpenAI Has Three Crawlers — Know the Difference

Before blocking, it's important to understand that OpenAI runs three distinct crawlers with different purposes. Blocking one does not block the others.

BotPurposeSafe to block?Tradeoff
GPTBotAI model training (GPT-4+)Yes ✓Future GPT models won't train on your content
ChatGPT-UserReal-time browsing by ChatGPT usersYes ✓ChatGPT users can't fetch live content from your site
OAI-SearchBotChatGPT Search resultsConsider carefullyYour site won't appear in ChatGPT Search answers

Option 1: Block via robots.txt (Recommended)

The robots.txt block is the standard OpenAI-endorsed opt-out method. Pick the level of blocking that fits your goals:

Block training only (allow ChatGPT Search)Most common
robots.txt
User-agent: GPTBot
Disallow: /

Blocks AI training. OAI-SearchBot and ChatGPT-User can still access your site.

Block all OpenAI crawlers
robots.txt
# Block all OpenAI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

Complete exclusion. Your content won't appear in any OpenAI product.

Block specific paths only
robots.txt
# Block GPTBot from premium/original content only
User-agent: GPTBot
Disallow: /articles/
Disallow: /premium/
Disallow: /research/
Allow: /

# Allow OAI-SearchBot for all paths (appear in ChatGPT Search)
User-agent: OAI-SearchBot
Allow: /
Block all major AI training crawlers at once
robots.txt
# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

# Allow standard search indexing
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Option 2: Next.js App Router Config

app/robots.ts
import { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      // Block training — allow ChatGPT Search
      { userAgent: 'GPTBot', disallow: ['/'] },
      { userAgent: 'ChatGPT-User', disallow: ['/'] },
      // Allow ChatGPT Search to index your content
      { userAgent: 'OAI-SearchBot', allow: ['/'] },
      // Normal search bots
      { userAgent: '*', allow: ['/'] },
    ],
    sitemap: 'https://yoursite.com/sitemap.xml',
  };
}

Option 3: IP-Level Block (nginx / Cloudflare)

OpenAI publishes its crawler IP ranges at openai.com/gptbot-ranges.txt. These change periodically, so IP blocking requires maintenance. Use this as a supplement to robots.txt, not a replacement.

nginx — block by user agent (recommended over IP)
nginx.conf
# Block GPTBot and ChatGPT-User at nginx level
if ($http_user_agent ~* "(GPTBot|ChatGPT-User|OAI-SearchBot)") {
    return 403;
}

# Or return 404 to avoid any fingerprinting
if ($http_user_agent ~* "GPTBot") {
    return 404;
}
Cloudflare WAF rule
Cloudflare → Security → WAF → Custom Rules
Field:     User Agent
Operator:  contains
Value:     GPTBot
Action:    Block

# Add additional rule for ChatGPT-User if needed
Field:     User Agent
Operator:  contains
Value:     ChatGPT-User
Action:    Block
Next.js Middleware (user-agent blocking)
middleware.ts
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const BLOCKED_BOTS = ['GPTBot', 'ChatGPT-User'];

export function middleware(request: NextRequest) {
  const ua = request.headers.get('user-agent') ?? '';
  
  if (BLOCKED_BOTS.some(bot => ua.includes(bot))) {
    return new NextResponse('Forbidden', { status: 403 });
  }
  
  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};

Note: Middleware blocking won't override a robots.txt Allow rule — it works independently at the HTTP layer.

Verify Your Block

1. Check your live robots.txt
https://yoursite.com/robots.txt

Confirm User-agent: GPTBot followed by Disallow: / appears.

2. Test with curl (simulate GPTBot)
# Simulate GPTBot fetching your homepage
curl -A "GPTBot" -I https://yoursite.com

# If blocked via nginx/middleware: expect 403 or 404
# If robots.txt only: expect 200 (robots.txt blocks crawling, not HTTP access)
3. Check server logs
grep -i "gptbot|chatgpt" /var/log/nginx/access.log | tail -20

After a robots.txt block: GPTBot should only hit /robots.txt then stop.

4. Use Open Shadow robots.txt checker
→ Check your robots.txt now

What Blocking GPTBot Actually Does (and Doesn't)

✓ What it DOES prevent

  • Your content being used in future GPT model training runs
  • OpenAI crawling new pages you publish
  • Bandwidth consumption from GPTBot crawl traffic
  • Your paywalled content feeding into AI datasets

✗ What it DOES NOT prevent

  • Content already crawled from being in existing model weights
  • ChatGPT-User browsing your site (separate bot)
  • ChatGPT citing your site from its existing knowledge
  • Other AI companies' bots (Google-Extended, ClaudeBot, etc.)

Should You Block GPTBot?

✓ Block it if you are:

  • A news publisher — original reporting is your product
  • Running a paid newsletter or subscription content
  • Concerned ChatGPT summarizes your articles instead of sending traffic
  • A creative writer — your voice and style have commercial value
  • Academic or research — IP concerns around training datasets

Consider allowing if you are:

  • Running open-source docs — AI training amplifies reach
  • A brand wanting AI citations and mentions for discovery
  • Building a content moat through sheer volume
  • Wanting to appear in ChatGPT Search answers (keep OAI-SearchBot at minimum)

Frequently Asked Questions

Does blocking GPTBot affect ChatGPT's ability to browse my site?
Blocking GPTBot only prevents OpenAI's training crawler. It does NOT prevent ChatGPT-User (the browsing agent used when ChatGPT users request a live page fetch) from accessing your site. If you want to block both training AND real-time browsing, you need to disallow both GPTBot and ChatGPT-User in robots.txt.
Does OpenAI respect robots.txt?
Yes. OpenAI has publicly committed to respecting robots.txt for GPTBot, ChatGPT-User, and OAI-SearchBot. Independent testing has confirmed this. Unlike Bytespider, which has been documented ignoring Disallow directives, GPTBot reliably backs off from disallowed paths.
What is the difference between GPTBot, ChatGPT-User, and OAI-SearchBot?
GPTBot is OpenAI's training crawler for future GPT models. ChatGPT-User fetches pages when a ChatGPT user asks it to browse a URL in real time. OAI-SearchBot powers ChatGPT Search results. All three respect robots.txt and have separate user agent tokens so you can control them independently.
Will blocking GPTBot remove my site from ChatGPT's knowledge base?
No. Content already crawled and used in training remains in existing model weights. Blocking GPTBot prevents future crawls from including your content in new training runs, but doesn't retroactively remove anything from deployed models.
Should I block OAI-SearchBot if I block GPTBot?
That depends on your goals. OAI-SearchBot powers ChatGPT Search — blocking it means your content won't appear when users search via ChatGPT. If you want traffic from ChatGPT Search, allow OAI-SearchBot while blocking GPTBot.

Related Guides & Tools

See Every Bot With Access to Your Site

Run a free AI visibility check to see which crawlers can access your content — GPTBot, Google-Extended, ClaudeBot, and more.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Scan My Site Free →

Related Guides