How to Block AI Bots on Apache: Complete 2026 Guide
Apache's module system gives you two clean paths to block AI crawlers: mod_rewrite (RewriteCond on HTTP_USER_AGENT) and mod_setenvif (BrowserMatch + Require). Both return a hard 403 before your application code runs. Header always set X-Robots-Tag from mod_headers handles the HTTP-layer training opt-out. And critically: always configure in VirtualHost, not .htaccess — .htaccess is re-parsed on every request.
Use VirtualHost, not .htaccess — unless on shared hosting
Apache reads and parses .htaccess on every single request — including bot requests you're trying to block. VirtualHost config in httpd.conf or /etc/apache2/sites-available/ is parsed once at startup. .htaccess also requires AllowOverride All, which disables a security layer. Only use .htaccess on shared hosting where you have no VirtualHost access.
Methods at a glance
| Method | What it does | Module required |
|---|---|---|
| robots.txt in DocumentRoot | Signals bots which paths are off-limits | Built-in |
| mod_rewrite + RewriteCond | Hard 403 on known AI User-Agents | mod_rewrite |
| mod_setenvif + BrowserMatch | Set env var → Require env (modern) | mod_setenvif + mod_authz_core |
| Header always set X-Robots-Tag | noai header on all HTTP responses | mod_headers |
| noai <meta> tag | AI training opt-out per HTML page | Your app / HTML files |
| mod_evasive | Rate limit to catch UA-rotating bots | mod_evasive |
1. robots.txt — DocumentRoot
Place robots.txt in your DocumentRoot directory (e.g. /var/www/html/). Apache serves it automatically. Add a Location block to suppress logging and disable .htaccess processing for this path:
# VirtualHost config
<VirtualHost *:443>
ServerName example.com
DocumentRoot /var/www/html
# Clean handling for robots.txt
<Location "/robots.txt">
SetHandler default-handler
Options None
AllowOverride None
Require all granted
</Location>
</VirtualHost># /var/www/html/robots.txt
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: *
Allow: /2. Hard 403 — mod_rewrite
mod_rewrite evaluates RewriteCond conditions before the RewriteRule. Match the HTTP_USER_AGENT server variable with a pipe-separated regex of known AI bot strings, then return a 403 with the [F] (Forbidden) flag. The [NC] flag makes the match case-insensitive.
# VirtualHost config (or .htaccess on shared hosting)
<VirtualHost *:443>
ServerName example.com
DocumentRoot /var/www/html
<IfModule mod_rewrite.c>
RewriteEngine On
# Allow robots.txt through — checked first
RewriteRule ^/robots.txt$ - [L]
# Block known AI bots — [F] returns 403, [L] stops processing, [NC] case-insensitive
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (YouBot|Applebot|DuckAssistBot|meta-externalagent|MistralAI-Spider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (oai-searchbot) [NC]
RewriteRule .* - [F,L]
</IfModule>
</VirtualHost>RewriteCond flag syntax
[NC]— case-insensitive match[OR]— join conditions with OR (default is AND)[F]— return 403 Forbidden immediately[L]— last rule, stop processing- The final
RewriteCondhas no[OR]— that's intentional: it ANDs with the preceding OR chain, which always evaluates true when any prior condition matched.
3. mod_setenvif — BrowserMatch (Apache 2.4)
The modern Apache 2.4 approach uses BrowserMatch from mod_setenvif to set an environment variable, then denies access with Require not env from mod_authz_core. This avoids mod_rewrite entirely.
<VirtualHost *:443>
ServerName example.com
DocumentRoot /var/www/html
# BrowserMatch sets an env variable when User-Agent matches
# Each directive is case-insensitive by default
BrowserMatch "GPTBot" bad_bot
BrowserMatch "ChatGPT-User" bad_bot
BrowserMatch "ClaudeBot" bad_bot
BrowserMatch "Claude-Web" bad_bot
BrowserMatch "anthropic-ai" bad_bot
BrowserMatch "CCBot" bad_bot
BrowserMatch "Google-Extended" bad_bot
BrowserMatch "PerplexityBot" bad_bot
BrowserMatch "Amazonbot" bad_bot
BrowserMatch "Bytespider" bad_bot
BrowserMatch "YouBot" bad_bot
BrowserMatch "DuckAssistBot" bad_bot
BrowserMatch "meta-externalagent" bad_bot
BrowserMatch "MistralAI-Spider" bad_bot
BrowserMatch "oai-searchbot" bad_bot
# Apply to all paths except robots.txt
<LocationMatch "^(?!/robots.txt)">
<RequireAll>
Require all granted
Require not env bad_bot
</RequireAll>
</LocationMatch>
</VirtualHost>RequireAll + Require not env bad_bot returns 403 when the variable is set. This is the cleanest approach on Apache 2.4+.
4. X-Robots-Tag — mod_headers
The Header directive from mod_headers adds HTTP response headers. always is required — without it Apache only sends the header on 2xx responses. Enable the module first: sudo a2enmod headers.
<VirtualHost *:443>
ServerName example.com
# Add to ALL responses (2xx + 4xx + 5xx)
# "always" is required — without it, header only sent on 2xx
Header always set X-Robots-Tag "noai, noimageai"
# Or scope it to HTML only (exclude API/JSON endpoints):
<FilesMatch ".(html|htm)$">
Header always set X-Robots-Tag "noai, noimageai"
</FilesMatch>
</VirtualHost># Enable mod_headers on Debian/Ubuntu:
sudo a2enmod headers
sudo systemctl reload apache2
# Verify it's loaded:
apache2ctl -M | grep headers5. noai meta tag — static HTML
Apache does not modify HTML content. For static sites served by Apache, add the noai meta tag directly in HTML or via your SSG base layout. For PHP, add it in your main layout file.
<!-- In your HTML <head> -->
<meta name="robots" content="noai, noimageai">
<!-- PHP layout (e.g. header.php): -->
<?php $robots = $robots ?? 'noai, noimageai'; ?>
<meta name="robots" content="<?= htmlspecialchars($robots) ?>">
<!-- WordPress: add to <head> in functions.php -->
add_action('wp_head', function() {
echo '<meta name="robots" content="noai, noimageai">' . "
";
});Use X-Robots-Tag (Section 4) as the HTTP-layer equivalent when you can't modify HTML files.
6. Rate limiting — mod_evasive
mod_evasive blocks IPs that exceed request thresholds — catching bots that rotate User-Agents to evade string matching. It's a DoS/scraping mitigation layer, not a replacement for User-Agent blocking.
# Install on Debian/Ubuntu:
sudo apt install libapache2-mod-evasive
sudo a2enmod evasive
# /etc/apache2/mods-available/evasive.conf
<IfModule mod_evasive20.c>
# Requests to same page within DOSPageInterval → trigger block
DOSPageCount 5
DOSPageInterval 1
# Total requests to site within DOSSiteInterval → trigger block
DOSSiteCount 50
DOSSiteInterval 1
# How long (seconds) IP stays blocked
DOSBlockingPeriod 10
# Log blocked IPs
DOSLogDir /var/log/mod_evasive
# DOSEmailNotify admin@example.com
</IfModule># Create log dir with correct permissions:
sudo mkdir -p /var/log/mod_evasive
sudo chown www-data:www-data /var/log/mod_evasive
sudo systemctl reload apache27. Reverse proxy setup
Apache can front a Node, Python, or other backend with mod_proxy + mod_proxy_http. The bot check (mod_rewrite or BrowserMatch) fires before ProxyPass — blocked bots never reach your backend.
# Enable required modules:
# sudo a2enmod proxy proxy_http rewrite headers
<VirtualHost *:443>
ServerName example.com
ProxyPreserveHost On
# Pass real client IP to backend
RequestHeader set X-Forwarded-For "%{REMOTE_ADDR}e"
RequestHeader set X-Forwarded-Proto "https"
Header always set X-Robots-Tag "noai, noimageai"
# robots.txt — serve locally, don't proxy
Alias /robots.txt /var/www/html/robots.txt
<Location "/robots.txt">
SetHandler default-handler
Require all granted
</Location>
<IfModule mod_rewrite.c>
RewriteEngine On
# Bot check fires BEFORE ProxyPass — blocked bots never hit backend
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot) [NC]
RewriteRule .* - [F,L]
# Proxy to backend
RewriteRule ^/(.*) http://127.0.0.1:3000/$1 [P,L]
</IfModule>
# Or using ProxyPass directly (after the RewriteRule block):
# ProxyPass / http://127.0.0.1:3000/
# ProxyPassReverse / http://127.0.0.1:3000/
</VirtualHost>8. Full VirtualHost example
A complete production VirtualHost combining robots.txt, mod_rewrite blocking, X-Robots-Tag, and SSL. Save to /etc/apache2/sites-available/example.com.conf.
# /etc/apache2/sites-available/example.com.conf
# HTTP → HTTPS redirect
<VirtualHost *:80>
ServerName example.com
Redirect permanent / https://example.com/
</VirtualHost>
# HTTPS VirtualHost
<VirtualHost *:443>
ServerName example.com
DocumentRoot /var/www/html/example.com
# SSL
SSLEngine on
SSLCertificateFile /etc/letsencrypt/live/example.com/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem
# Security headers + AI training opt-out
Header always set X-Robots-Tag "noai, noimageai"
Header always set X-Content-Type-Options "nosniff"
# robots.txt — clean, no logging
<Location "/robots.txt">
SetHandler default-handler
Require all granted
</Location>
<Directory "/var/www/html/example.com">
Options -Indexes +FollowSymLinks
AllowOverride None # Never AllowOverride All in production
Require all granted
<IfModule mod_rewrite.c>
RewriteEngine On
# Skip robots.txt
RewriteRule ^robots.txt$ - [L]
# Block AI bots
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (YouBot|Applebot|DuckAssistBot|meta-externalagent) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (MistralAI-Spider|oai-searchbot) [NC]
RewriteRule .* - [F,L]
# SPA fallback (if serving a React/Vue/Angular app):
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ /index.html [L]
</IfModule>
</Directory>
ErrorLog ${APACHE_LOG_DIR}/example.com-error.log
CustomLog ${APACHE_LOG_DIR}/example.com-access.log combined
</VirtualHost># Enable site and reload
sudo a2ensite example.com.conf
sudo a2enmod rewrite headers ssl
sudo apache2ctl configtest # test before reload
sudo systemctl reload apache29. Docker deployment
The official httpd:alpine image is the standard choice. Bake your config into the image for immutable deployments, or volume-mount for easier iteration.
# Dockerfile
FROM httpd:alpine
# Copy custom config (httpd.conf) and webroot
COPY httpd.conf /usr/local/apache2/conf/httpd.conf
COPY dist/ /usr/local/apache2/htdocs/
EXPOSE 80
CMD ["httpd-foreground"]# docker-compose.yml
services:
apache:
image: httpd:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./httpd.conf:/usr/local/apache2/conf/httpd.conf:ro
- ./dist:/usr/local/apache2/htdocs:ro
restart: unless-stopped# Test config inside container:
docker exec apache-container httpd -t
docker exec apache-container httpd -k gracefulFrequently asked questions
Should I block AI bots in .htaccess or VirtualHost?
Always prefer VirtualHost config in httpd.conf or sites-available/. Apache reads and parses .htaccess on every request (including the bot requests you're trying to block), and requires AllowOverride All which is a security risk. Use .htaccess only on shared hosting where VirtualHost access isn't available.
What is the difference between mod_rewrite and mod_setenvif?
mod_setenvif (BrowserMatch) sets an environment variable and pairs with Require not env bad_bot — the cleaner Apache 2.4 approach with no regex flags to remember. mod_rewrite uses RewriteCond %{HTTP_USER_AGENT} + the [F] flag — more verbose but works in both VirtualHost and .htaccess, and handles the robots.txt exemption more explicitly.
How do I add X-Robots-Tag headers in Apache?
Header always set X-Robots-Tag "noai, noimageai" in VirtualHost, with mod_headers enabled (sudo a2enmod headers). The always keyword sends the header on all responses including 4xx/5xx. Without it, Apache only sends the header on 2xx responses.
How do I serve robots.txt from Apache?
Place it in your DocumentRoot — Apache serves it automatically. Add a <Location "/robots.txt"> block with SetHandler default-handler and AllowOverride None to prevent .htaccess interference and keep the config clean.
Does Apache bot blocking work as a reverse proxy?
Yes. With mod_proxy + mod_proxy_http, place the mod_rewrite or BrowserMatch bot check before the ProxyPass directive. Blocked bots receive 403 from Apache without the request reaching your backend. Add ProxyPreserveHost On and pass X-Forwarded-For via mod_headers so your backend sees the real client IP.
What is mod_evasive and should I use it?
mod_evasive blocks IPs that exceed request thresholds (DOSPageCount per URL, DOSSiteCount per site). It catches bots that rotate User-Agents to evade string matching. It complements mod_rewrite/BrowserMatch — use both. Install with sudo apt install libapache2-mod-evasive && sudo a2enmod evasive.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.