How to Block AI Bots on Nginx: Complete 2026 Guide
Nginx sits in front of everything — it's the first layer your traffic touches, which makes it the most powerful place to block AI crawlers. Whether you're serving a static site directly or running nginx as a reverse proxy in front of Node, Python, or PHP, the bot-blocking config is the same: a map block in your http {} context, a return 403 before the request reaches your origin, and add_header X-Robots-Tag on all responses.
The map block must go in http {} — not server {}
The most common nginx bot-blocking mistake: placing the map directive inside a server or location block. Nginx will refuse to start with a config error. The map directive belongs in the http {} context, defined once globally, then used inside any number of server blocks.
Methods at a glance
| Method | What it does | Where it lives |
|---|---|---|
| robots.txt location block | Signals bots which paths are off-limits | Webroot / root directive |
| map + if ($bad_bot) | Hard 403 on known AI User-Agents | http {} then server {} |
| add_header X-Robots-Tag | noai header on all HTTP responses | server {} or location {} |
| noai <meta> tag | AI training opt-out per HTML page | HTML files / layout template |
| limit_req_zone | Rate-limit to slow bot scraping | http {} then location {} |
| geo block | IP-range blocking (no if needed) | http {} context |
1. robots.txt — location block
Nginx serves files from the directory set by the root directive (e.g. /var/www/html). Place robots.txt in that directory, then add a dedicated location block so nginx handles it cleanly — no PHP, no upstream, no access log noise.
# nginx server block
server {
listen 443 ssl;
server_name example.com;
root /var/www/html;
# Exact-match location for robots.txt — fastest evaluation
location = /robots.txt {
try_files $uri =404;
access_log off; # don't pollute access logs
log_not_found off; # don't log 404 if absent
expires 1d;
add_header Cache-Control "public, max-age=86400";
}
}Your robots.txt should explicitly disallow AI training crawlers:
# /var/www/html/robots.txt
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: *
Allow: /robots.txt is advisory — compliant bots will respect it, aggressive scrapers will not. Use the map block below for hard enforcement.
2. Hard 403 blocking — map block
The map directive matches $http_user_agent against a list of patterns and sets a variable. Nginx evaluates map lazily — only when the variable is first used — so it adds no overhead for normal requests. The map block must be inside http {}, not inside server {} or location {}.
# /etc/nginx/nginx.conf (or included .conf in http block)
http {
# ── AI bot User-Agent map ──────────────────────────────────────────
# Must be inside http {}, NOT inside server {} or location {}
map $http_user_agent $bad_bot {
default 0; # allow everything by default
~*GPTBot 1;
~*ChatGPT-User 1;
~*ClaudeBot 1;
~*Claude-Web 1;
~*anthropic-ai 1;
~*CCBot 1;
~*Google-Extended 1;
~*PerplexityBot 1;
~*Amazonbot 1;
~*Bytespider 1;
~*YouBot 1;
~*Applebot 1;
~*DuckAssistBot 1;
~*meta-externalagent 1;
~*MistralAI-Spider 1;
~*oai-searchbot 1;
}
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}Inside each server block, check $bad_bot and return 403. Always exempt /robots.txt so compliant bots can still read your directives:
server {
listen 443 ssl;
server_name example.com;
root /var/www/html;
# ── robots.txt — exempt from bot blocking ──────────────────────────
location = /robots.txt {
try_files $uri =404;
access_log off;
log_not_found off;
}
# ── Block known AI bots ────────────────────────────────────────────
# "if" is safe here — we're only returning a status, not using
# proxy_pass, rewrite, or other directives that interact poorly with if
location / {
if ($bad_bot) {
return 403 "Forbidden";
}
# ... your normal config (try_files, proxy_pass, etc.)
try_files $uri $uri/ /index.html;
}
}On using if in nginx
Nginx docs warn against if because it interacts badly with proxy_pass and rewrite. For a pure return 403 with no other directives in the same block, if is safe and correct. If you want to avoid if entirely, use a geo block for IP-based blocking instead (see Section 5).
3. noai meta tag — static HTML
Nginx does not modify HTML content — it serves files as-is. For a static site, add the noai meta tag directly to every HTML file, or (better) to your base layout template in your SSG of choice (Hugo, Eleventy, Jekyll, Astro).
<!-- In your HTML <head> -->
<meta name="robots" content="noai, noimageai">
<!-- Or combined with other directives: -->
<meta name="robots" content="index, follow, noai, noimageai">For SSG base layout templates:
<!-- Hugo: layouts/_default/baseof.html -->
<head>
<meta name="robots" content="{{ with .Params.robots }}{{ . }}{{ else }}noai, noimageai{{ end }}">
</head>
<!-- Eleventy: _includes/base.njk -->
<head>
<meta name="robots" content="{{ robots | default('noai, noimageai') }}">
</head>
<!-- Jekyll: _layouts/default.html -->
<head>
<meta name="robots" content="{{ page.robots | default: 'noai, noimageai' }}">
</head>The HTTP-layer equivalent is X-Robots-Tag (Section 4) — set via nginx add_header, no HTML changes needed.
4. X-Robots-Tag — add_header
X-Robots-Tag is the HTTP-header equivalent of the noai meta tag — useful for non-HTML resources (PDFs, images, API responses) and for sites where you can't easily modify HTML. The always keyword is critical: without it nginx only sends the header on 2xx/3xx responses.
server {
listen 443 ssl;
server_name example.com;
# Add X-Robots-Tag to ALL responses (including 4xx/5xx)
# "always" is required — without it, header is only sent on 2xx/3xx
add_header X-Robots-Tag "noai, noimageai" always;
# For HTML pages only (skip on API/JSON endpoints):
location ~* .html$ {
add_header X-Robots-Tag "noai, noimageai" always;
try_files $uri =404;
}
}add_header inheritance gotcha
In nginx, if a block defines any add_header directive, it replaces (not appends to) all inherited add_header directives from parent blocks. If your location blocks already have add_header directives (e.g. CORS headers), repeat the X-Robots-Tag header in those blocks too, or use ngx_http_headers_more_module (more_set_headers) which appends instead.
5. Rate limiting — limit_req_zone
Rate limiting catches scrapers that rotate User-Agents or use unknown bot strings. The limit_req_zone directive lives in http {}; the limit_req directive applies it inside location {}.
# In http {} block:
# 10 MB zone keyed by client IP, max 10 requests/second
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
# Stricter zone for paths that bots love to hammer
limit_req_zone $binary_remote_addr zone=content:10m rate=2r/s;
# Apply in server block:
server {
listen 443 ssl;
location / {
limit_req zone=general burst=20 nodelay;
limit_req_status 429; # Return 429 Too Many Requests
if ($bad_bot) { return 403; }
try_files $uri $uri/ /index.html;
}
# Stricter limit on content-heavy paths
location /blog {
limit_req zone=content burst=5 nodelay;
limit_req_status 429;
try_files $uri $uri/ =404;
}
}burst allows short traffic spikes above the rate; nodelay processes burst requests immediately (vs queuing them). Without limit_req_status, nginx returns 503 — set it to 429 for correct semantics.
6. Reverse proxy setup
When nginx fronts a Node, Python, PHP, or other upstream server, the bot check fires before proxy_pass — blocked requests never reach your origin. This is the most effective architecture for high-traffic sites: nginx handles the rejection at near-zero cost.
server {
listen 443 ssl;
server_name example.com;
# Headers for upstream to identify real client IP
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
add_header X-Robots-Tag "noai, noimageai" always;
location = /robots.txt {
root /var/www/html;
try_files $uri =404;
access_log off;
}
location / {
# Bot check fires BEFORE proxy_pass — blocked bots never reach origin
if ($bad_bot) {
return 403 "Forbidden";
}
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 60s;
}
}7. Full nginx.conf example
A complete production-ready config combining all techniques above — map block in http {}, 403 blocking, robots.txt, X-Robots-Tag, and rate limiting. Works for both static sites and reverse proxy setups.
# /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
# ── AI bot User-Agent map ──────────────────────────────────────────
# MUST be in http {} — not in server {} or location {}
map $http_user_agent $bad_bot {
default 0;
~*GPTBot 1;
~*ChatGPT-User 1;
~*ClaudeBot 1;
~*Claude-Web 1;
~*anthropic-ai 1;
~*CCBot 1;
~*Google-Extended 1;
~*PerplexityBot 1;
~*Amazonbot 1;
~*Bytespider 1;
~*YouBot 1;
~*Applebot 1;
~*DuckAssistBot 1;
~*meta-externalagent 1;
~*MistralAI-Spider 1;
~*oai-searchbot 1;
}
# ── Rate limiting zones ─────────────────────────────────────────────
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;
# ── Redirect HTTP → HTTPS ───────────────────────────────────────────
server {
listen 80;
server_name example.com www.example.com;
return 301 https://example.com$request_uri;
}
# ── Main HTTPS server ───────────────────────────────────────────────
server {
listen 443 ssl http2;
server_name example.com;
root /var/www/html;
ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
# X-Robots-Tag on all responses
add_header X-Robots-Tag "noai, noimageai" always;
# ── robots.txt ──────────────────────────────────────────────────
location = /robots.txt {
try_files $uri =404;
access_log off;
log_not_found off;
expires 1d;
}
# ── All other requests ───────────────────────────────────────────
location / {
limit_req zone=general burst=20 nodelay;
limit_req_status 429;
# Block known AI bots — fires before proxy_pass / try_files
if ($bad_bot) {
return 403 "Forbidden";
}
# Static site:
try_files $uri $uri/ /index.html;
# Reverse proxy (comment out try_files, uncomment these):
# proxy_pass http://127.0.0.1:3000;
# proxy_http_version 1.1;
# proxy_set_header Host $host;
# proxy_set_header X-Real-IP $remote_addr;
# proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# proxy_set_header X-Forwarded-Proto $scheme;
}
}
}8. Docker deployment
Mount your nginx config and webroot as volumes, or bake them into the image for immutable deployments. The official nginx:alpine image is the standard choice — ~25 MB.
# Dockerfile — baked config (immutable, good for CI/CD)
FROM nginx:alpine
# Remove default config
RUN rm /etc/nginx/conf.d/default.conf
# Copy your config and webroot
COPY nginx.conf /etc/nginx/nginx.conf
COPY dist/ /var/www/html/
EXPOSE 80 443
CMD ["nginx", "-g", "daemon off;"]# docker-compose.yml — volume-mounted config (easier to update)
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./dist:/var/www/html:ro
- ./certs:/etc/letsencrypt:ro
restart: unless-stoppedTest your config before reloading — always:
# Inside the container:
nginx -t # test config syntax
nginx -s reload # reload without downtime
# From host:
docker exec nginx-container nginx -t
docker exec nginx-container nginx -s reload9. Ubuntu / Debian setup
On a bare-metal or VPS server, split your config across /etc/nginx/conf.d/ files for clarity. Keep the map block in a dedicated file included from the main nginx.conf.
# Install nginx
sudo apt update && sudo apt install -y nginx
# Create the bot map config (included from nginx.conf http block)
sudo tee /etc/nginx/conf.d/bot-map.conf > /dev/null <<'EOF'
map $http_user_agent $bad_bot {
default 0;
~*GPTBot 1;
~*ClaudeBot 1;
~*CCBot 1;
~*Google-Extended 1;
~*PerplexityBot 1;
~*Amazonbot 1;
~*Bytespider 1;
}
EOF
# Create site config
sudo tee /etc/nginx/sites-available/example.com > /dev/null <<'EOF'
server {
listen 80;
server_name example.com;
root /var/www/html/example.com;
add_header X-Robots-Tag "noai, noimageai" always;
location = /robots.txt {
try_files $uri =404;
access_log off;
}
location / {
if ($bad_bot) { return 403; }
try_files $uri $uri/ /index.html;
}
}
EOF
# Enable site
sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/
# Test and reload
sudo nginx -t && sudo systemctl reload nginxFrequently asked questions
Where does the map block go in nginx.conf?
Inside http {} — not inside server {} or location {}. A map directive at the server or location level causes an nginx config error on startup. Keep it in /etc/nginx/conf.d/bot-map.conf (included from the http block) for clean organisation.
Is "if ($bad_bot)" safe to use in nginx?
Yes, when used only to return a status code. Nginx if is dangerous when combined with proxy_pass, rewrite, or set — not for a plain return 403. If you want to avoid if entirely, use a geo block for IP-based blocking instead.
Does add_header X-Robots-Tag work for error responses?
Only with the always keyword: add_header X-Robots-Tag "noai, noimageai" always. Without always, nginx sends the header only on 2xx and 3xx responses — 4xx and 5xx responses omit it. Also remember the inheritance rule: a child block with any add_header replaces all inherited ones.
How do I block bots on nginx without if?
Use a geo block (in http {}) to map IP ranges to a variable, then check that variable. For User-Agent-based blocking without if, you can use a map + a named location with return 403, redirected from the main location via try_files — but this is more complex than the simple if ($bad_bot) pattern which is safe for this use case.
Does nginx bot blocking work as a reverse proxy?
Yes — and it's the most effective placement. The if ($bad_bot) { return 403; } check fires before proxy_pass, so blocked bots never reach your Node/Python/PHP upstream. This reduces origin load and protects your app server from bot traffic at near-zero nginx overhead.
How do I add noai meta tags on a static site served by nginx?
Nginx serves HTML files as-is — it doesn't inject content. Add <meta name="robots" content="noai, noimageai"> to the <head> of your HTML files, or to the base layout in your SSG (Hugo, Eleventy, Jekyll). The HTTP-layer alternative is add_header X-Robots-Tag— no HTML edits needed.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.