GPTBot is OpenAI's training crawler for GPT-4 and future models. Here's how to opt out — plus the critical difference between GPTBot, ChatGPT-User, and OAI-SearchBot.
Before blocking, it's important to understand that OpenAI runs three distinct crawlers with different purposes. Blocking one does not block the others.
| Bot | Purpose | Safe to block? | Tradeoff |
|---|---|---|---|
| GPTBot | AI model training (GPT-4+) | Yes ✓ | Future GPT models won't train on your content |
| ChatGPT-User | Real-time browsing by ChatGPT users | Yes ✓ | ChatGPT users can't fetch live content from your site |
| OAI-SearchBot | ChatGPT Search results | Consider carefully | Your site won't appear in ChatGPT Search answers |
robots.txt (Recommended)The robots.txt block is the standard OpenAI-endorsed opt-out method. Pick the level of blocking that fits your goals:
User-agent: GPTBot Disallow: /
Blocks AI training. OAI-SearchBot and ChatGPT-User can still access your site.
# Block all OpenAI crawlers User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: /
Complete exclusion. Your content won't appear in any OpenAI product.
# Block GPTBot from premium/original content only User-agent: GPTBot Disallow: /articles/ Disallow: /premium/ Disallow: /research/ Allow: / # Allow OAI-SearchBot for all paths (appear in ChatGPT Search) User-agent: OAI-SearchBot Allow: /
# Block AI training crawlers User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: ClaudeBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / # Allow standard search indexing User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / User-agent: * Allow: /
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
// Block training — allow ChatGPT Search
{ userAgent: 'GPTBot', disallow: ['/'] },
{ userAgent: 'ChatGPT-User', disallow: ['/'] },
// Allow ChatGPT Search to index your content
{ userAgent: 'OAI-SearchBot', allow: ['/'] },
// Normal search bots
{ userAgent: '*', allow: ['/'] },
],
sitemap: 'https://yoursite.com/sitemap.xml',
};
}OpenAI publishes its crawler IP ranges at openai.com/gptbot-ranges.txt. These change periodically, so IP blocking requires maintenance. Use this as a supplement to robots.txt, not a replacement.
# Block GPTBot and ChatGPT-User at nginx level
if ($http_user_agent ~* "(GPTBot|ChatGPT-User|OAI-SearchBot)") {
return 403;
}
# Or return 404 to avoid any fingerprinting
if ($http_user_agent ~* "GPTBot") {
return 404;
}Field: User Agent Operator: contains Value: GPTBot Action: Block # Add additional rule for ChatGPT-User if needed Field: User Agent Operator: contains Value: ChatGPT-User Action: Block
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
const BLOCKED_BOTS = ['GPTBot', 'ChatGPT-User'];
export function middleware(request: NextRequest) {
const ua = request.headers.get('user-agent') ?? '';
if (BLOCKED_BOTS.some(bot => ua.includes(bot))) {
return new NextResponse('Forbidden', { status: 403 });
}
return NextResponse.next();
}
export const config = {
matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};Note: Middleware blocking won't override a robots.txt Allow rule — it works independently at the HTTP layer.
https://yoursite.com/robots.txt
Confirm User-agent: GPTBot followed by Disallow: / appears.
# Simulate GPTBot fetching your homepage curl -A "GPTBot" -I https://yoursite.com # If blocked via nginx/middleware: expect 403 or 404 # If robots.txt only: expect 200 (robots.txt blocks crawling, not HTTP access)
grep -i "gptbot|chatgpt" /var/log/nginx/access.log | tail -20
After a robots.txt block: GPTBot should only hit /robots.txt then stop.
Run a free AI visibility check to see which crawlers can access your content — GPTBot, Google-Extended, ClaudeBot, and more.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →