How to Block AI Bots on NGINX Unit: Complete 2026 Guide
NGINX Unit is a modern, polyglot application server from NGINX Inc. Unlike traditional web servers, it is configured entirely through a JSON REST API — no config file syntax, no restarts. It runs Python, Node.js, Go, PHP, Ruby, Java, and Perl applications natively. Bot blocking uses Unit's routing system with header match conditions and return actions.
Contents
How Unit routing works
NGINX Unit processes requests through routes — a JSON array of steps evaluated in order. Each step has an optional match object (conditions) and an action object (what to do). The first step whose match passes wins.
{
"routes": [
{
"match": { /* conditions */ },
"action": { /* pass / return / share */ }
},
{
/* no match = matches everything (default route) */
"action": { "pass": "applications/myapp" }
}
]
}For bot blocking, add a step with a match on the User-Agent header and an action of {"return": 403} — placed before the application pass step.
Bot blocking via header match
Wildcard pattern matching (simplest)
{
"routes": [
{
"match": {
"headers": {
"User-Agent": [
"*GPTBot*",
"*ClaudeBot*",
"*anthropic-ai*",
"*CCBot*",
"*Google-Extended*",
"*AhrefsBot*",
"*Bytespider*",
"*Amazonbot*",
"*Diffbot*",
"*FacebookBot*",
"*cohere-ai*",
"*PerplexityBot*",
"*YouBot*"
]
}
},
"action": {
"return": 403
}
},
{
"action": {
"pass": "applications/myapp"
}
}
]
}* in a match string matches any sequence of characters (including none). "*GPTBot*" matches any User-Agent containing GPTBot anywhere in the string. Array values in a match condition are OR logic — if any pattern matches, the condition is true.Regex matching (more precise)
{
"routes": [
{
"match": {
"headers": {
"User-Agent": "~(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)"
}
},
"action": {
"return": 403
}
},
{
"action": {
"pass": "applications/myapp"
}
}
]
}~ to use it as a PCRE regular expression. The (?i) flag makes the match case-insensitive. Without the ~ prefix, the string is treated as a literal pattern with * wildcards only.Custom response body for blocked bots
{
"match": {
"headers": {
"User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*"]
}
},
"action": {
"return": 403,
"response_headers": {
"Content-Type": "text/plain"
}
}
}return action sends the HTTP status code and headers, but does not support a custom response body in the route config. For a custom body, forward blocked requests to a small application that returns the 403 with a body, or use an upstream nginx instance for the body content.Applying config via the control API
NGINX Unit's configuration is managed through a Unix socket REST API. No restart required — changes take effect immediately.
Full config replace (PUT)
# Replace the entire config
curl -X PUT \
--data-binary @unit.json \
--unix-socket /var/run/control.unit.sock \
http://localhost/configUpdate only the routes section (PATCH)
# Update just the routes without touching applications
curl -X PUT \
--data-binary @routes.json \
--unix-socket /var/run/control.unit.sock \
http://localhost/config/routesRead current config
curl --unix-socket /var/run/control.unit.sock http://localhost/config | python3 -m json.toolInsert a new route step at position 0 (prepend)
# Insert bot-blocking step at the beginning of the routes array
curl -X POST \
--data-binary '{
"match": {
"headers": {
"User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*", "*CCBot*", "*Google-Extended*"]
}
},
"action": { "return": 403 }
}' \
--unix-socket /var/run/control.unit.sock \
http://localhost/config/routes/0/config/routes/0 to address the first element, /config/routes/1 for the second, etc. POST to an array index inserts at that position. PUTreplaces it.X-Robots-Tag via response_headers
Add X-Robots-Tag to all application responses by including response_headers in the pass action:
{
"routes": [
{
"match": {
"headers": {
"User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*", "*CCBot*", "*Google-Extended*", "*AhrefsBot*", "*Bytespider*", "*Amazonbot*", "*Diffbot*", "*FacebookBot*", "*cohere-ai*", "*PerplexityBot*", "*YouBot*"]
}
},
"action": { "return": 403 }
},
{
"action": {
"pass": "applications/myapp",
"response_headers": {
"X-Robots-Tag": "noai, noimageai",
"X-Content-Type-Options": "nosniff"
}
}
}
]
}response_headers are injected into responses from that action step — both the application pass and static file share actions support them. They are applied in addition to any headers your application sets. If your application already sets X-Robots-Tag, both values will appear — use application-level logic to set it conditionally instead.Serving robots.txt as a static file
Add a route step that intercepts /robots.txt requests and serves a static file directly — without passing the request to your application:
{
"routes": [
{
"match": {
"headers": {
"User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*", "*CCBot*", "*Google-Extended*", "*AhrefsBot*", "*Bytespider*", "*PerplexityBot*"]
}
},
"action": { "return": 403 }
},
{
"match": { "uri": "/robots.txt" },
"action": {
"share": "/var/www/static$uri",
"response_headers": {
"Cache-Control": "public, max-age=86400",
"X-Robots-Tag": "noai, noimageai"
}
}
},
{
"action": {
"pass": "applications/myapp",
"response_headers": {
"X-Robots-Tag": "noai, noimageai"
}
}
}
]
}Place your robots.txt at /var/www/static/robots.txt. The $uri variable in the share path resolves to the request URI (/robots.txt), so the full path becomes /var/www/static/robots.txt.
Full unit.json example
{
"listeners": {
"*:80": {
"pass": "routes"
},
"*:443": {
"pass": "routes",
"tls": {
"certificate": "bundle"
}
}
},
"routes": [
{
"match": {
"headers": {
"User-Agent": [
"*GPTBot*",
"*ClaudeBot*",
"*anthropic-ai*",
"*CCBot*",
"*Google-Extended*",
"*AhrefsBot*",
"*Bytespider*",
"*Amazonbot*",
"*Diffbot*",
"*FacebookBot*",
"*cohere-ai*",
"*PerplexityBot*",
"*YouBot*"
]
}
},
"action": {
"return": 403
}
},
{
"match": {
"uri": "/robots.txt"
},
"action": {
"share": "/var/www/static$uri",
"response_headers": {
"Cache-Control": "public, max-age=86400"
}
}
},
{
"match": {
"uri": ["/static/*", "/assets/*"]
},
"action": {
"share": "/var/www$uri"
}
},
{
"action": {
"pass": "applications/myapp",
"response_headers": {
"X-Robots-Tag": "noai, noimageai",
"X-Content-Type-Options": "nosniff",
"X-Frame-Options": "SAMEORIGIN"
}
}
}
],
"applications": {
"myapp": {
"type": "python 3",
"path": "/var/www/myapp",
"module": "wsgi",
"user": "www-data",
"group": "www-data",
"environment": {
"NODE_ENV": "production"
}
}
},
"settings": {
"http": {
"header_read_timeout": 30,
"body_read_timeout": 30,
"send_timeout": 30,
"idle_timeout": 180,
"max_body_size": 10485760
}
}
}Apply and verify
# Apply config
curl -X PUT \
--data-binary @unit.json \
--unix-socket /var/run/control.unit.sock \
http://localhost/config
# Verify response
curl -s --unix-socket /var/run/control.unit.sock http://localhost/config | python3 -m json.tool | head -20
# Test bot blocking
curl -A "GPTBot/1.0" http://localhost/
# Expected: HTTP/1.1 403 Forbidden
# Test legitimate request
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1)" http://localhost/
# Expected: HTTP/1.1 200 OKDocker deployment
docker-compose.yml
services:
unit:
image: unit:1.32.1-python3.12
ports:
- "80:80"
- "443:443"
volumes:
- ./unit.json:/docker-entrypoint.d/unit.json:ro
- ./myapp:/var/www/myapp:ro
- ./static:/var/www/static:ro
- unit_state:/var/lib/unit
restart: unless-stopped
volumes:
unit_state:/docker-entrypoint.d/ on first boot. If the state volume is empty, it applies unit.json automatically. On subsequent starts (state volume has data), it uses the saved state — PUT to the control socket to update.Available Docker image tags
unit:1.32.1-python3.12 # Python 3.12
unit:1.32.1-node21 # Node.js 21
unit:1.32.1-go1.22 # Go 1.22
unit:1.32.1-php8.3 # PHP 8.3
unit:1.32.1-ruby3.3 # Ruby 3.3
unit:1.32.1-jsc11 # JavaScript (JDK 11)FAQ
How do I block AI bots by User-Agent in NGINX Unit?
Add a route step with match.headers["User-Agent"] as an array of wildcard patterns ("*GPTBot*") or a single regex string with ~ prefix, and action.return = 403. Place it as the first step in the routes array. Apply via the control API curl -X PUT --data-binary @unit.json --unix-socket ....
How does NGINX Unit routing work for bot blocking?
Routes are a JSON array evaluated in order — first match wins. Add a bot-blocking step (match on User-Agent, action return 403) before the application pass step. Array values in a match condition use OR logic.
How do I add X-Robots-Tag in NGINX Unit?
Use response_headers in the pass action: {"response_headers": {"X-Robots-Tag": "noai, noimageai"}}. Applied to all responses from that step. Static share actions also support response_headers.
How do I serve robots.txt in NGINX Unit?
Add a route step with match.uri = "/robots.txt" and action.share = "/var/www/static$uri". Place it before the application pass step — Unit serves the file directly without hitting your app.
How do I apply configuration changes without restarting NGINX Unit?
Use the control API: curl -X PUT --data-binary @config.json --unix-socket /var/run/control.unit.sock http://localhost/config. Changes are live immediately. Use PATCH or PUT to specific paths (e.g. /config/routes) to update sections without replacing the entire config.
What header match syntax does NGINX Unit support?
Array of strings (OR) with * wildcard, or a single string prefixed with ~ for PCRE regex. Example wildcard: "*GPTBot*". Example regex: "~(?i)(GPTBot|ClaudeBot)". Case-insensitive regex requires the (?i) inline flag.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.