PHP powers around 78% of all websites — including millions on shared hosting where server config is locked down. The fastest option everywhere is $_SERVER['HTTP_USER_AGENT'] + preg_match() at the top of your front controller. On shared hosting (cPanel, Bluehost, SiteGround), .htaccess mod_rewrite is the only server-level option — no PHP or nginx access required.
| Method |
|---|
| Static robots.txt in document root Always — zero PHP involved |
| Dynamic robots.php endpoint Need staging vs production rules |
| Front controller $_SERVER check Any PHP app with index.php entry point |
| .htaccess mod_rewrite (Apache) Apache / shared hosting (cPanel) — most common |
| noai meta tag in HTML layout Any PHP template / layout file |
| nginx + PHP-FPM block nginx serving PHP via php-fpm |
Place robots.txt in your public document root — the same directory as index.php (typically public/, public_html/, or httpdocs/). Apache and nginx serve static files directly — PHP is never invoked for this request.
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: / User-agent: * Allow: /
When you need environment-specific rules — block everything on staging, block only AI bots in production — route /robots.txt to a PHP script. The routing happens in .htaccess or nginx config.
<?php
$aiBotsDisallow = <<<'EOT'
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: *
Allow: /
EOT;
$blockAll = "User-agent: *
Disallow: /";
header('Content-Type: text/plain; charset=utf-8');
// Use APP_ENV environment variable or a constant defined in your config
$env = getenv('APP_ENV') ?: 'development';
echo ($env === 'production') ? $aiBotsDisallow : $blockAll;RewriteEngine On # Route /robots.txt to the PHP script (place before WordPress/framework rules) RewriteRule ^robots.txt$ robots.php [L]
Static vs dynamic: If a static robots.txt file exists in the same directory, Apache will serve the static file and ignore the RewriteRule. Remove the static file when using the dynamic PHP approach — or use RewriteCond %{REQUEST_FILENAME} !-f before the rule.
Most PHP applications have a single entry point — index.php. Add a user-agent check at the very top before any output, framework bootstrap, or session start. Use http_response_code(403) and exit to stop execution immediately.
<?php
// ─── Block AI bots ────────────────────────────────────────────────────────────
// Run before framework bootstrap, session start, or any output.
// Always allow /robots.txt through so bots can read your opt-out rules.
if ($_SERVER['REQUEST_URI'] !== '/robots.txt') {
$userAgent = $_SERVER['HTTP_USER_AGENT'] ?? '';
if ($userAgent !== '' && preg_match(
'/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|' .
'Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|' .
'Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|' .
'cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|' .
'webzio-extended|gemini-deep-research/i',
$userAgent
)) {
http_response_code(403);
header('Content-Type: text/plain');
exit('Forbidden');
}
}
// ─── Your application starts here ────────────────────────────────────────────
require_once __DIR__ . '/../vendor/autoload.php';
// ... rest of bootstrapFor multi-entry-point applications, extract to a dedicated file and require it at the top of each:
<?php
declare(strict_types=1);
/**
* Block known AI training and scraping bots.
* Include at the top of every entry point.
* Always allows /robots.txt through.
*/
function blockAiBots(): void
{
if (str_ends_with($_SERVER['REQUEST_URI'] ?? '', '/robots.txt')) {
return;
}
$ua = $_SERVER['HTTP_USER_AGENT'] ?? '';
if ($ua === '') {
return;
}
$pattern =
'/GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|' .
'Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|' .
'Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|' .
'cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|' .
'webzio-extended|gemini-deep-research/i';
if (preg_match($pattern, $ua)) {
http_response_code(403);
header('Content-Type: text/plain');
exit('Forbidden');
}
}
blockAiBots();preg_match vs strpos: preg_match() with /i (case-insensitive) is the right tool here — bot names are alphanumeric and you need partial matching. The overhead of a single regex match against a ~100-character user-agent string is negligible. Avoid chained str_contains() calls — the single regex is cleaner and faster for 25+ patterns.
On shared hosting (cPanel, Bluehost, SiteGround, DreamHost), .htaccess is the only server-level blocking option — you have no access to Apache's main config or nginx. This blocks before PHP runs.
RewriteEngine On
# ─── Block AI bots — place ABOVE any WordPress or framework RewriteRules ───
# [F] = 403 Forbidden, [L] = stop processing rules
# Always allow robots.txt through
RewriteRule ^robots.txt$ - [L]
# Block AI training and scraping bots
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChatGPT-User [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OAI-SearchBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} anthropic-ai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PerplexityBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} meta-externalagent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Amazonbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Applebot-Extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} xAI-Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DeepSeekBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MistralBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Diffbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cohere-ai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AI2Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YouBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DuckAssistBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} omgili [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webzio-extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gemini-deep-research [NC]
RewriteRule .* - [F,L]
# ─── WordPress / your framework rules below this line ───
# BEGIN WordPress
# ...Rule order: .htaccess rules are processed top-to-bottom. Place the AI bot block before the # BEGIN WordPress block (or your framework's rewrite section). If WordPress or another tool regenerates your .htaccess, your block rules may be overwritten — check after any plugin update that touches .htaccess.
Condense to a single regex condition if you prefer a shorter file:
RewriteEngine On
RewriteRule ^robots.txt$ - [L]
RewriteCond %{HTTP_USER_AGENT} "GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|YouBot|DuckAssistBot|omgili|webzio-extended|gemini-deep-research" [NC]
RewriteRule .* - [F,L]Add noai and noimageai meta tags to your shared layout template. In plain PHP this is typically a header file included on every page. Use a variable or constant to allow per-page override.
<?php
// $robots — set per-page before including this header.
// Default: block AI training on all pages.
$robots = $robots ?? 'noai, noimageai';
?>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title><?= htmlspecialchars($pageTitle ?? 'My Site') ?></title>
<meta name="robots" content="<?= htmlspecialchars($robots) ?>">
</head>
<body><?php // This page allows AI indexing but not AI training $robots = 'noimageai'; $pageTitle = 'My Blog Post'; require_once 'templates/header.php'; ?> <h1>Blog Post Title</h1> <!-- ... -->
Send as an HTTP header for non-HTML responses (JSON APIs, XML sitemaps) or when you want to ensure delivery before the HTML is parsed:
<?php
// Send X-Robots-Tag on all HTML responses
// Call before any output is sent
header('X-Robots-Tag: noai, noimageai');On servers running nginx with PHP-FPM, add a map block for user agent matching. The block runs at the nginx layer — PHP-FPM is never invoked for matched bots.
map $http_user_agent $block_ai_bot {
default 0;
~*GPTBot 1;
~*ChatGPT-User 1;
~*OAI-SearchBot 1;
~*ClaudeBot 1;
~*anthropic-ai 1;
~*Google-Extended 1;
~*Bytespider 1;
~*CCBot 1;
~*PerplexityBot 1;
~*meta-externalagent 1;
~*Amazonbot 1;
~*Applebot-Extended 1;
~*xAI-Bot 1;
~*DeepSeekBot 1;
~*MistralBot 1;
~*Diffbot 1;
~*cohere-ai 1;
~*AI2Bot 1;
~*YouBot 1;
~*DuckAssistBot 1;
~*omgili 1;
~*webzio-extended 1;
~*gemini-deep-research 1;
}
server {
listen 443 ssl;
server_name yoursite.com;
root /var/www/yoursite/public;
index index.php;
# Always serve static robots.txt
location = /robots.txt {
try_files $uri =404;
}
location / {
if ($block_ai_bot) {
return 403 "Forbidden";
}
try_files $uri $uri/ /index.php?$query_string;
}
location ~ .php$ {
fastcgi_pass unix:/run/php/php8.2-fpm.sock;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
include fastcgi_params;
}
}| Hosting type |
|---|
| Shared (cPanel, Bluehost etc.) |
| VPS + Apache |
| VPS + nginx + PHP-FPM |
| Docker container (PHP built-in server) |
| Managed (Kinsta, WP Engine) |
| Platform (Heroku, Railway, Render) |
Add a .htaccess mod_rewrite block above your existing rules (above # BEGIN WordPress if applicable). Use RewriteCond %{HTTP_USER_AGENT} with the bot name and [NC,OR] flags, then RewriteRule .* - [F,L] to return 403. This is the only server-level option on most shared hosts — no PHP or server admin access needed.
.htaccess blocks before PHP runs — more efficient and applies to all file types including images. PHP front controller blocking only applies to requests that reach PHP. Use .htaccess for Apache/shared hosting; use nginx map blocks for nginx servers; fall back to PHP $_SERVER checking for platforms where you have no server config access (Docker, Heroku, Railway).
preg_match() with a case-insensitive pattern against $_SERVER['HTTP_USER_AGENT']. Use a single regex with all bot names joined by | — one preg_match() call is faster than 25 separate str_contains() calls. Always check that the user agent string is non-empty before matching.
No. Google Search (Googlebot), Bing (Bingbot), and other search engine crawlers are not in the block list. The list specifically targets AI training bots (GPTBot, CCBot, ClaudeBot) and AI search bots where you choose not to appear in AI answers. Your robots.txt explicitly allows all unblocked bots with "User-agent: * Allow: /".
WordPress and plugins like Yoast regenerate the section between # BEGIN WordPress and # END WordPress markers. Place your AI bot block ABOVE the # BEGIN WordPress line — WordPress only modifies content between its markers, not above them. After adding the rules, verify they persist after a plugin update or permalink flush.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.