How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Nginx · Web Server · Reverse Proxy·9 min read

How to Block AI Bots on Nginx: Complete 2026 Guide

Nginx sits in front of everything — it's the first layer your traffic touches, which makes it the most powerful place to block AI crawlers. Whether you're serving a static site directly or running nginx as a reverse proxy in front of Node, Python, or PHP, the bot-blocking config is the same: a map block in your http {} context, a return 403 before the request reaches your origin, and add_header X-Robots-Tag on all responses.

The map block must go in http {} — not server {}

The most common nginx bot-blocking mistake: placing the map directive inside a server or location block. Nginx will refuse to start with a config error. The map directive belongs in the http {} context, defined once globally, then used inside any number of server blocks.

Methods at a glance

Method	What it does	Where it lives
robots.txt location block	Signals bots which paths are off-limits	Webroot / root directive
map + if ($bad_bot)	Hard 403 on known AI User-Agents	http {} then server {}
add_header X-Robots-Tag	noai header on all HTTP responses	server {} or location {}
noai <meta> tag	AI training opt-out per HTML page	HTML files / layout template
limit_req_zone	Rate-limit to slow bot scraping	http {} then location {}
geo block	IP-range blocking (no if needed)	http {} context

1. robots.txt — location block

Nginx serves files from the directory set by the root directive (e.g. /var/www/html). Place robots.txt in that directory, then add a dedicated location block so nginx handles it cleanly — no PHP, no upstream, no access log noise.

# nginx server block
server {
    listen 443 ssl;
    server_name example.com;
    root /var/www/html;

    # Exact-match location for robots.txt — fastest evaluation
    location = /robots.txt {
        try_files $uri =404;
        access_log  off;       # don't pollute access logs
        log_not_found off;     # don't log 404 if absent
        expires     1d;
        add_header  Cache-Control "public, max-age=86400";
    }
}

Your robots.txt should explicitly disallow AI training crawlers:

# /var/www/html/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /

robots.txt is advisory — compliant bots will respect it, aggressive scrapers will not. Use the map block below for hard enforcement.

2. Hard 403 blocking — map block

The map directive matches $http_user_agent against a list of patterns and sets a variable. Nginx evaluates map lazily — only when the variable is first used — so it adds no overhead for normal requests. The map block must be inside http {}, not inside server {} or location {}.

# /etc/nginx/nginx.conf  (or included .conf in http block)

http {
    # ── AI bot User-Agent map ──────────────────────────────────────────
    # Must be inside http {}, NOT inside server {} or location {}
    map $http_user_agent $bad_bot {
        default          0;       # allow everything by default
        ~*GPTBot         1;
        ~*ChatGPT-User   1;
        ~*ClaudeBot      1;
        ~*Claude-Web     1;
        ~*anthropic-ai   1;
        ~*CCBot          1;
        ~*Google-Extended 1;
        ~*PerplexityBot  1;
        ~*Amazonbot      1;
        ~*Bytespider     1;
        ~*YouBot         1;
        ~*Applebot       1;
        ~*DuckAssistBot  1;
        ~*meta-externalagent 1;
        ~*MistralAI-Spider 1;
        ~*oai-searchbot  1;
    }

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Inside each server block, check $bad_bot and return 403. Always exempt /robots.txt so compliant bots can still read your directives:

server {
    listen 443 ssl;
    server_name example.com;
    root /var/www/html;

    # ── robots.txt — exempt from bot blocking ──────────────────────────
    location = /robots.txt {
        try_files $uri =404;
        access_log off;
        log_not_found off;
    }

    # ── Block known AI bots ────────────────────────────────────────────
    # "if" is safe here — we're only returning a status, not using
    # proxy_pass, rewrite, or other directives that interact poorly with if
    location / {
        if ($bad_bot) {
            return 403 "Forbidden";
        }

        # ... your normal config (try_files, proxy_pass, etc.)
        try_files $uri $uri/ /index.html;
    }
}

On using if in nginx

Nginx docs warn against if because it interacts badly with proxy_pass and rewrite. For a pure return 403 with no other directives in the same block, if is safe and correct. If you want to avoid if entirely, use a geo block for IP-based blocking instead (see Section 5).

3. noai meta tag — static HTML

Nginx does not modify HTML content — it serves files as-is. For a static site, add the noai meta tag directly to every HTML file, or (better) to your base layout template in your SSG of choice (Hugo, Eleventy, Jekyll, Astro).

<!-- In your HTML <head> -->
<meta name="robots" content="noai, noimageai">

<!-- Or combined with other directives: -->
<meta name="robots" content="index, follow, noai, noimageai">

For SSG base layout templates:

<!-- Hugo: layouts/_default/baseof.html -->
<head>
  <meta name="robots" content="{{ with .Params.robots }}{{ . }}{{ else }}noai, noimageai{{ end }}">
</head>

<!-- Eleventy: _includes/base.njk -->
<head>
  <meta name="robots" content="{{ robots | default('noai, noimageai') }}">
</head>

<!-- Jekyll: _layouts/default.html -->
<head>
  <meta name="robots" content="{{ page.robots | default: 'noai, noimageai' }}">
</head>

The HTTP-layer equivalent is X-Robots-Tag (Section 4) — set via nginx add_header, no HTML changes needed.

4. X-Robots-Tag — add_header

X-Robots-Tag is the HTTP-header equivalent of the noai meta tag — useful for non-HTML resources (PDFs, images, API responses) and for sites where you can't easily modify HTML. The always keyword is critical: without it nginx only sends the header on 2xx/3xx responses.

server {
    listen 443 ssl;
    server_name example.com;

    # Add X-Robots-Tag to ALL responses (including 4xx/5xx)
    # "always" is required — without it, header is only sent on 2xx/3xx
    add_header X-Robots-Tag "noai, noimageai" always;

    # For HTML pages only (skip on API/JSON endpoints):
    location ~* .html$ {
        add_header X-Robots-Tag "noai, noimageai" always;
        try_files $uri =404;
    }
}

add_header inheritance gotcha

In nginx, if a block defines any add_header directive, it replaces (not appends to) all inherited add_header directives from parent blocks. If your location blocks already have add_header directives (e.g. CORS headers), repeat the X-Robots-Tag header in those blocks too, or use ngx_http_headers_more_module (more_set_headers) which appends instead.

5. Rate limiting — limit_req_zone

Rate limiting catches scrapers that rotate User-Agents or use unknown bot strings. The limit_req_zone directive lives in http {}; the limit_req directive applies it inside location {}.

# In http {} block:

# 10 MB zone keyed by client IP, max 10 requests/second
limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;

# Stricter zone for paths that bots love to hammer
limit_req_zone $binary_remote_addr zone=content:10m rate=2r/s;

# Apply in server block:
server {
    listen 443 ssl;

    location / {
        limit_req zone=general burst=20 nodelay;
        limit_req_status 429;  # Return 429 Too Many Requests

        if ($bad_bot) { return 403; }
        try_files $uri $uri/ /index.html;
    }

    # Stricter limit on content-heavy paths
    location /blog {
        limit_req zone=content burst=5 nodelay;
        limit_req_status 429;
        try_files $uri $uri/ =404;
    }
}

burst allows short traffic spikes above the rate; nodelay processes burst requests immediately (vs queuing them). Without limit_req_status, nginx returns 503 — set it to 429 for correct semantics.

6. Reverse proxy setup

When nginx fronts a Node, Python, PHP, or other upstream server, the bot check fires before proxy_pass — blocked requests never reach your origin. This is the most effective architecture for high-traffic sites: nginx handles the rejection at near-zero cost.

server {
    listen 443 ssl;
    server_name example.com;

    # Headers for upstream to identify real client IP
    proxy_set_header Host              $host;
    proxy_set_header X-Real-IP         $remote_addr;
    proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    add_header X-Robots-Tag "noai, noimageai" always;

    location = /robots.txt {
        root /var/www/html;
        try_files $uri =404;
        access_log off;
    }

    location / {
        # Bot check fires BEFORE proxy_pass — blocked bots never reach origin
        if ($bad_bot) {
            return 403 "Forbidden";
        }

        proxy_pass         http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header   Upgrade    $http_upgrade;
        proxy_set_header   Connection "upgrade";
        proxy_read_timeout 60s;
    }
}

7. Full nginx.conf example

A complete production-ready config combining all techniques above — map block in http {}, 403 blocking, robots.txt, X-Robots-Tag, and rate limiting. Works for both static sites and reverse proxy setups.

# /etc/nginx/nginx.conf

user  nginx;
worker_processes  auto;
error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections 1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    sendfile      on;
    keepalive_timeout 65;

    # ── AI bot User-Agent map ──────────────────────────────────────────
    # MUST be in http {} — not in server {} or location {}
    map $http_user_agent $bad_bot {
        default              0;
        ~*GPTBot             1;
        ~*ChatGPT-User       1;
        ~*ClaudeBot          1;
        ~*Claude-Web         1;
        ~*anthropic-ai       1;
        ~*CCBot              1;
        ~*Google-Extended    1;
        ~*PerplexityBot      1;
        ~*Amazonbot          1;
        ~*Bytespider         1;
        ~*YouBot             1;
        ~*Applebot           1;
        ~*DuckAssistBot      1;
        ~*meta-externalagent 1;
        ~*MistralAI-Spider   1;
        ~*oai-searchbot      1;
    }

    # ── Rate limiting zones ─────────────────────────────────────────────
    limit_req_zone $binary_remote_addr zone=general:10m rate=10r/s;

    # ── Redirect HTTP → HTTPS ───────────────────────────────────────────
    server {
        listen 80;
        server_name example.com www.example.com;
        return 301 https://example.com$request_uri;
    }

    # ── Main HTTPS server ───────────────────────────────────────────────
    server {
        listen 443 ssl http2;
        server_name example.com;
        root /var/www/html;

        ssl_certificate     /etc/letsencrypt/live/example.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
        ssl_protocols       TLSv1.2 TLSv1.3;

        # X-Robots-Tag on all responses
        add_header X-Robots-Tag "noai, noimageai" always;

        # ── robots.txt ──────────────────────────────────────────────────
        location = /robots.txt {
            try_files $uri =404;
            access_log    off;
            log_not_found off;
            expires       1d;
        }

        # ── All other requests ───────────────────────────────────────────
        location / {
            limit_req zone=general burst=20 nodelay;
            limit_req_status 429;

            # Block known AI bots — fires before proxy_pass / try_files
            if ($bad_bot) {
                return 403 "Forbidden";
            }

            # Static site:
            try_files $uri $uri/ /index.html;

            # Reverse proxy (comment out try_files, uncomment these):
            # proxy_pass         http://127.0.0.1:3000;
            # proxy_http_version 1.1;
            # proxy_set_header   Host              $host;
            # proxy_set_header   X-Real-IP         $remote_addr;
            # proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
            # proxy_set_header   X-Forwarded-Proto $scheme;
        }
    }
}

8. Docker deployment

Mount your nginx config and webroot as volumes, or bake them into the image for immutable deployments. The official nginx:alpine image is the standard choice — ~25 MB.

# Dockerfile — baked config (immutable, good for CI/CD)
FROM nginx:alpine

# Remove default config
RUN rm /etc/nginx/conf.d/default.conf

# Copy your config and webroot
COPY nginx.conf /etc/nginx/nginx.conf
COPY dist/       /var/www/html/

EXPOSE 80 443
CMD ["nginx", "-g", "daemon off;"]

# docker-compose.yml — volume-mounted config (easier to update)
services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./dist:/var/www/html:ro
      - ./certs:/etc/letsencrypt:ro
    restart: unless-stopped

Test your config before reloading — always:

# Inside the container:
nginx -t                  # test config syntax
nginx -s reload           # reload without downtime

# From host:
docker exec nginx-container nginx -t
docker exec nginx-container nginx -s reload

9. Ubuntu / Debian setup

On a bare-metal or VPS server, split your config across /etc/nginx/conf.d/ files for clarity. Keep the map block in a dedicated file included from the main nginx.conf.

# Install nginx
sudo apt update && sudo apt install -y nginx

# Create the bot map config (included from nginx.conf http block)
sudo tee /etc/nginx/conf.d/bot-map.conf > /dev/null <<'EOF'
map $http_user_agent $bad_bot {
    default              0;
    ~*GPTBot             1;
    ~*ClaudeBot          1;
    ~*CCBot              1;
    ~*Google-Extended    1;
    ~*PerplexityBot      1;
    ~*Amazonbot          1;
    ~*Bytespider         1;
}
EOF

# Create site config
sudo tee /etc/nginx/sites-available/example.com > /dev/null <<'EOF'
server {
    listen 80;
    server_name example.com;
    root /var/www/html/example.com;
    add_header X-Robots-Tag "noai, noimageai" always;

    location = /robots.txt {
        try_files $uri =404;
        access_log off;
    }

    location / {
        if ($bad_bot) { return 403; }
        try_files $uri $uri/ /index.html;
    }
}
EOF

# Enable site
sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/

# Test and reload
sudo nginx -t && sudo systemctl reload nginx

Frequently asked questions

Where does the map block go in nginx.conf?

Inside http {} — not inside server {} or location {}. A map directive at the server or location level causes an nginx config error on startup. Keep it in /etc/nginx/conf.d/bot-map.conf (included from the http block) for clean organisation.

Is "if ($bad_bot)" safe to use in nginx?

Yes, when used only to return a status code. Nginx if is dangerous when combined with proxy_pass, rewrite, or set — not for a plain return 403. If you want to avoid if entirely, use a geo block for IP-based blocking instead.

Does add_header X-Robots-Tag work for error responses?

Only with the always keyword: add_header X-Robots-Tag "noai, noimageai" always. Without always, nginx sends the header only on 2xx and 3xx responses — 4xx and 5xx responses omit it. Also remember the inheritance rule: a child block with any add_header replaces all inherited ones.

How do I block bots on nginx without if?

Use a geo block (in http {}) to map IP ranges to a variable, then check that variable. For User-Agent-based blocking without if, you can use a map + a named location with return 403, redirected from the main location via try_files — but this is more complex than the simple if ($bad_bot) pattern which is safe for this use case.

Does nginx bot blocking work as a reverse proxy?

Yes — and it's the most effective placement. The if ($bad_bot) { return 403; } check fires before proxy_pass, so blocked bots never reach your Node/Python/PHP upstream. This reduces origin load and protects your app server from bot traffic at near-zero nginx overhead.

How do I add noai meta tags on a static site served by nginx?

Nginx serves HTML files as-is — it doesn't inject content. Add <meta name="robots" content="noai, noimageai"> to the <head> of your HTML files, or to the base layout in your SSG (Hugo, Eleventy, Jekyll). The HTTP-layer alternative is add_header X-Robots-Tag— no HTML edits needed.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.