Skip to content
Guides/NGINX Unit
NGINX Unit · Application Server · JSON Config API8 min read

How to Block AI Bots on NGINX Unit: Complete 2026 Guide

NGINX Unit is a modern, polyglot application server from NGINX Inc. Unlike traditional web servers, it is configured entirely through a JSON REST API — no config file syntax, no restarts. It runs Python, Node.js, Go, PHP, Ruby, Java, and Perl applications natively. Bot blocking uses Unit's routing system with header match conditions and return actions.

How Unit routing works

NGINX Unit processes requests through routes — a JSON array of steps evaluated in order. Each step has an optional match object (conditions) and an action object (what to do). The first step whose match passes wins.

{
  "routes": [
    {
      "match": { /* conditions */ },
      "action": { /* pass / return / share */ }
    },
    {
      /* no match = matches everything (default route) */
      "action": { "pass": "applications/myapp" }
    }
  ]
}

For bot blocking, add a step with a match on the User-Agent header and an action of {"return": 403} — placed before the application pass step.

Bot blocking via header match

Wildcard pattern matching (simplest)

{
  "routes": [
    {
      "match": {
        "headers": {
          "User-Agent": [
            "*GPTBot*",
            "*ClaudeBot*",
            "*anthropic-ai*",
            "*CCBot*",
            "*Google-Extended*",
            "*AhrefsBot*",
            "*Bytespider*",
            "*Amazonbot*",
            "*Diffbot*",
            "*FacebookBot*",
            "*cohere-ai*",
            "*PerplexityBot*",
            "*YouBot*"
          ]
        }
      },
      "action": {
        "return": 403
      }
    },
    {
      "action": {
        "pass": "applications/myapp"
      }
    }
  ]
}
Wildcard matching: In NGINX Unit, * in a match string matches any sequence of characters (including none). "*GPTBot*" matches any User-Agent containing GPTBot anywhere in the string. Array values in a match condition are OR logic — if any pattern matches, the condition is true.

Regex matching (more precise)

{
  "routes": [
    {
      "match": {
        "headers": {
          "User-Agent": "~(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)"
        }
      },
      "action": {
        "return": 403
      }
    },
    {
      "action": {
        "pass": "applications/myapp"
      }
    }
  ]
}
Regex prefix: In NGINX Unit match conditions, prefix a string with ~ to use it as a PCRE regular expression. The (?i) flag makes the match case-insensitive. Without the ~ prefix, the string is treated as a literal pattern with * wildcards only.

Custom response body for blocked bots

{
  "match": {
    "headers": {
      "User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*"]
    }
  },
  "action": {
    "return": 403,
    "response_headers": {
      "Content-Type": "text/plain"
    }
  }
}
No response body in return action: NGINX Unit's return action sends the HTTP status code and headers, but does not support a custom response body in the route config. For a custom body, forward blocked requests to a small application that returns the 403 with a body, or use an upstream nginx instance for the body content.

Applying config via the control API

NGINX Unit's configuration is managed through a Unix socket REST API. No restart required — changes take effect immediately.

Full config replace (PUT)

# Replace the entire config
curl -X PUT \
  --data-binary @unit.json \
  --unix-socket /var/run/control.unit.sock \
  http://localhost/config

Update only the routes section (PATCH)

# Update just the routes without touching applications
curl -X PUT \
  --data-binary @routes.json \
  --unix-socket /var/run/control.unit.sock \
  http://localhost/config/routes

Read current config

curl --unix-socket /var/run/control.unit.sock http://localhost/config | python3 -m json.tool

Insert a new route step at position 0 (prepend)

# Insert bot-blocking step at the beginning of the routes array
curl -X POST \
  --data-binary '{
    "match": {
      "headers": {
        "User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*", "*CCBot*", "*Google-Extended*"]
      }
    },
    "action": { "return": 403 }
  }' \
  --unix-socket /var/run/control.unit.sock \
  http://localhost/config/routes/0
Array indexing in the API: Use /config/routes/0 to address the first element, /config/routes/1 for the second, etc. POST to an array index inserts at that position. PUTreplaces it.

X-Robots-Tag via response_headers

Add X-Robots-Tag to all application responses by including response_headers in the pass action:

{
  "routes": [
    {
      "match": {
        "headers": {
          "User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*", "*CCBot*", "*Google-Extended*", "*AhrefsBot*", "*Bytespider*", "*Amazonbot*", "*Diffbot*", "*FacebookBot*", "*cohere-ai*", "*PerplexityBot*", "*YouBot*"]
        }
      },
      "action": { "return": 403 }
    },
    {
      "action": {
        "pass": "applications/myapp",
        "response_headers": {
          "X-Robots-Tag": "noai, noimageai",
          "X-Content-Type-Options": "nosniff"
        }
      }
    }
  ]
}
response_headers placement: Headers in response_headers are injected into responses from that action step — both the application pass and static file share actions support them. They are applied in addition to any headers your application sets. If your application already sets X-Robots-Tag, both values will appear — use application-level logic to set it conditionally instead.

Serving robots.txt as a static file

Add a route step that intercepts /robots.txt requests and serves a static file directly — without passing the request to your application:

{
  "routes": [
    {
      "match": {
        "headers": {
          "User-Agent": ["*GPTBot*", "*ClaudeBot*", "*anthropic-ai*", "*CCBot*", "*Google-Extended*", "*AhrefsBot*", "*Bytespider*", "*PerplexityBot*"]
        }
      },
      "action": { "return": 403 }
    },
    {
      "match": { "uri": "/robots.txt" },
      "action": {
        "share": "/var/www/static$uri",
        "response_headers": {
          "Cache-Control": "public, max-age=86400",
          "X-Robots-Tag": "noai, noimageai"
        }
      }
    },
    {
      "action": {
        "pass": "applications/myapp",
        "response_headers": {
          "X-Robots-Tag": "noai, noimageai"
        }
      }
    }
  ]
}

Place your robots.txt at /var/www/static/robots.txt. The $uri variable in the share path resolves to the request URI (/robots.txt), so the full path becomes /var/www/static/robots.txt.

Full unit.json example

{
  "listeners": {
    "*:80": {
      "pass": "routes"
    },
    "*:443": {
      "pass": "routes",
      "tls": {
        "certificate": "bundle"
      }
    }
  },

  "routes": [
    {
      "match": {
        "headers": {
          "User-Agent": [
            "*GPTBot*",
            "*ClaudeBot*",
            "*anthropic-ai*",
            "*CCBot*",
            "*Google-Extended*",
            "*AhrefsBot*",
            "*Bytespider*",
            "*Amazonbot*",
            "*Diffbot*",
            "*FacebookBot*",
            "*cohere-ai*",
            "*PerplexityBot*",
            "*YouBot*"
          ]
        }
      },
      "action": {
        "return": 403
      }
    },
    {
      "match": {
        "uri": "/robots.txt"
      },
      "action": {
        "share": "/var/www/static$uri",
        "response_headers": {
          "Cache-Control": "public, max-age=86400"
        }
      }
    },
    {
      "match": {
        "uri": ["/static/*", "/assets/*"]
      },
      "action": {
        "share": "/var/www$uri"
      }
    },
    {
      "action": {
        "pass": "applications/myapp",
        "response_headers": {
          "X-Robots-Tag": "noai, noimageai",
          "X-Content-Type-Options": "nosniff",
          "X-Frame-Options": "SAMEORIGIN"
        }
      }
    }
  ],

  "applications": {
    "myapp": {
      "type": "python 3",
      "path": "/var/www/myapp",
      "module": "wsgi",
      "user": "www-data",
      "group": "www-data",
      "environment": {
        "NODE_ENV": "production"
      }
    }
  },

  "settings": {
    "http": {
      "header_read_timeout": 30,
      "body_read_timeout": 30,
      "send_timeout": 30,
      "idle_timeout": 180,
      "max_body_size": 10485760
    }
  }
}

Apply and verify

# Apply config
curl -X PUT \
  --data-binary @unit.json \
  --unix-socket /var/run/control.unit.sock \
  http://localhost/config

# Verify response
curl -s --unix-socket /var/run/control.unit.sock http://localhost/config | python3 -m json.tool | head -20

# Test bot blocking
curl -A "GPTBot/1.0" http://localhost/
# Expected: HTTP/1.1 403 Forbidden

# Test legitimate request
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1)" http://localhost/
# Expected: HTTP/1.1 200 OK

Docker deployment

docker-compose.yml

services:
  unit:
    image: unit:1.32.1-python3.12
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./unit.json:/docker-entrypoint.d/unit.json:ro
      - ./myapp:/var/www/myapp:ro
      - ./static:/var/www/static:ro
      - unit_state:/var/lib/unit
    restart: unless-stopped

volumes:
  unit_state:
Docker entrypoint: The official NGINX Unit Docker image loads JSON files from /docker-entrypoint.d/ on first boot. If the state volume is empty, it applies unit.json automatically. On subsequent starts (state volume has data), it uses the saved state — PUT to the control socket to update.

Available Docker image tags

unit:1.32.1-python3.12   # Python 3.12
unit:1.32.1-node21       # Node.js 21
unit:1.32.1-go1.22       # Go 1.22
unit:1.32.1-php8.3       # PHP 8.3
unit:1.32.1-ruby3.3      # Ruby 3.3
unit:1.32.1-jsc11        # JavaScript (JDK 11)

FAQ

How do I block AI bots by User-Agent in NGINX Unit?

Add a route step with match.headers["User-Agent"] as an array of wildcard patterns ("*GPTBot*") or a single regex string with ~ prefix, and action.return = 403. Place it as the first step in the routes array. Apply via the control API curl -X PUT --data-binary @unit.json --unix-socket ....

How does NGINX Unit routing work for bot blocking?

Routes are a JSON array evaluated in order — first match wins. Add a bot-blocking step (match on User-Agent, action return 403) before the application pass step. Array values in a match condition use OR logic.

How do I add X-Robots-Tag in NGINX Unit?

Use response_headers in the pass action: {"response_headers": {"X-Robots-Tag": "noai, noimageai"}}. Applied to all responses from that step. Static share actions also support response_headers.

How do I serve robots.txt in NGINX Unit?

Add a route step with match.uri = "/robots.txt" and action.share = "/var/www/static$uri". Place it before the application pass step — Unit serves the file directly without hitting your app.

How do I apply configuration changes without restarting NGINX Unit?

Use the control API: curl -X PUT --data-binary @config.json --unix-socket /var/run/control.unit.sock http://localhost/config. Changes are live immediately. Use PATCH or PUT to specific paths (e.g. /config/routes) to update sections without replacing the entire config.

What header match syntax does NGINX Unit support?

Array of strings (OR) with * wildcard, or a single string prefixed with ~ for PCRE regex. Example wildcard: "*GPTBot*". Example regex: "~(?i)(GPTBot|ClaudeBot)". Case-insensitive regex requires the (?i) inline flag.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.