How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Varnish Cache · HTTP Accelerator · Reverse Proxy9 min read

How to Block AI Bots on Varnish Cache: Complete 2026 Guide

Varnish Cache is a high-performance HTTP accelerator (caching reverse proxy) used by major media publishers, e-commerce platforms, and CDNs. It is configured entirely through VCL (Varnish Configuration Language) — a domain-specific language for HTTP request handling. Bot blocking in Varnish is done in the vcl_recv subroutine, before cache lookup and before any backend hit.

vcl_recv — block bots before cache lookup
vcl_synth — custom 403 response
X-Robots-Tag in vcl_backend_response / vcl_deliver
Serving robots.txt from VCL
Rate limiting with vsthrottle
VCL ACL for IP-based exceptions
Full VCL example
Docker deployment
FAQ

vcl_recv — block bots before cache lookup

vcl_recv is the first subroutine called for every incoming request — it runs before cache lookup, before backend selection, and before any backend connection. This is the correct place to block bots: zero backend load, zero cache pollution.

vcl 4.1;

import std;

sub vcl_recv {
    # Block AI training and scraping bots by User-Agent
    if (req.http.User-Agent ~ "(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)") {
        return(synth(403, "Forbidden"));
    }
}

VCL regex syntax: The ~ operator does PCRE regex matching. The (?i) flag at the start makes the entire pattern case-insensitive. Alternatives separated by | inside the group. Unlike nginx or HAProxy, Varnish requires a single regex — you cannot list values space-separated.

Block in vcl_recv, not vcl_hit or vcl_miss. vcl_hit runs only when there is a cache hit — bots on uncached URLs would bypass the check. vcl_recv runs unconditionally for every request.

Block and log (using std.log)

vcl 4.1;

import std;

sub vcl_recv {
    if (req.http.User-Agent ~ "(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)") {
        std.log("AI bot blocked: " + req.http.User-Agent);
        return(synth(403, "Forbidden"));
    }
}

std.log() writes to the Varnish shared memory log (VSL), readable with varnishlog -g request -q "VCL_Log ~ \"AI bot\"".

vcl_synth — custom 403 response

When return(synth(403, "Forbidden")) is called in vcl_recv, Varnish calls vcl_synth to build the synthetic response. Customise it to return a clean response body:

sub vcl_synth {
    if (resp.status == 403) {
        set resp.http.Content-Type = "text/plain; charset=utf-8";
        set resp.http.X-Robots-Tag = "noindex";
        synthetic("Forbidden" + {"
"});
        return(deliver);
    }

    # Default synth handling for other status codes
    return(deliver);
}

VCL here-doc syntax: { + text + } is VCL's long-string syntax — equivalent to a here-doc. The newline after Forbidden is inside the long string. Use it when your synthetic body contains special characters or line breaks.

Return JSON for API consumers

sub vcl_synth {
    if (resp.status == 403) {
        set resp.http.Content-Type = "application/json; charset=utf-8";
        synthetic({"{"status":403,"error":"Forbidden"}"});
        return(deliver);
    }
}

X-Robots-Tag in vcl_backend_response / vcl_deliver

Add X-Robots-Tag to all responses. Two options depending on when you want to set it:

vcl_backend_response — set on backend response (before caching)

sub vcl_backend_response {
    # Add X-Robots-Tag to all backend responses
    # This value is cached alongside the object
    set beresp.http.X-Robots-Tag = "noai, noimageai";
}

Cached with the object: Headers set in vcl_backend_response are stored in Varnish's cache alongside the object. All subsequent cache hits will include the header without another backend request.

vcl_deliver — set on delivery to client (after cache lookup)

sub vcl_deliver {
    # Set X-Robots-Tag on every response sent to the client
    # Use this if you need to set/override regardless of cache state
    set resp.http.X-Robots-Tag = "noai, noimageai";

    # Optional: remove internal headers before delivery
    unset resp.http.X-Varnish;
    unset resp.http.Via;
}

vcl_deliver runs just before sending the response to the client — it can override headers set in vcl_backend_response. Use it when you need unconditional header injection regardless of cache state.

Serving robots.txt from VCL

Serve robots.txt directly from Varnish without a backend hit:

sub vcl_recv {
    # Serve robots.txt directly from Varnish (no backend hit)
    if (req.url == "/robots.txt") {
        return(synth(200, "OK"));
    }

    # ... rest of vcl_recv
}

sub vcl_synth {
    if (resp.status == 200 && req.url == "/robots.txt") {
        set resp.http.Content-Type = "text/plain; charset=utf-8";
        set resp.http.Cache-Control = "public, max-age=86400";
        synthetic({"User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://example.com/sitemap.xml
"});
        return(deliver);
    }

    if (resp.status == 403) {
        set resp.http.Content-Type = "text/plain; charset=utf-8";
        synthetic("Forbidden");
        return(deliver);
    }
}

Rate limiting with vsthrottle

The vsthrottle VMOD provides per-key rate limiting. It's available in the varnish-modules package (open source) and bundled with Varnish Enterprise:

vcl 4.1;

import vsthrottle;

sub vcl_recv {
    # Block AI bots by UA first (fastest path)
    if (req.http.User-Agent ~ "(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)") {
        return(synth(403, "Forbidden"));
    }

    # Rate limit: 100 requests per 10 seconds per IP
    # Key: client IP (use X-Forwarded-For if behind a load balancer)
    if (vsthrottle.is_denied(req.http.X-Forwarded-For, 100, 10s)) {
        return(synth(429, "Too Many Requests"));
    }
}

Install varnish-modules (Ubuntu/Debian)

apt-get install varnish-modules

Install varnish-modules (from source)

git clone https://github.com/varnish/varnish-modules.git
cd varnish-modules
./bootstrap
./configure
make
make install

vsthrottle key selection: Using client.ip as the key works for direct connections. If Varnish is behind a load balancer, use req.http.X-Forwarded-For — but validate it first to prevent IP spoofing. For production, consider a trusted IP header from your load balancer (e.g. req.http.X-Real-IP).

VCL ACL for IP-based exceptions

VCL's acl statement defines IP ranges. Use it to whitelist your own crawlers or monitoring services from the bot-blocking rules:

vcl 4.1;

import std;
import vsthrottle;

# Trusted IPs — bypass bot blocking (your own crawlers, monitoring)
acl trusted_crawlers {
    "127.0.0.1";
    "10.0.0.0"/8;
    "192.168.0.0"/16;
    "203.0.113.42";     # your monitoring service IP
}

sub vcl_recv {
    # Bypass all checks for trusted crawlers
    if (client.ip ~ trusted_crawlers) {
        return(pass);
    }

    # Block AI bots
    if (req.http.User-Agent ~ "(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)") {
        std.log("AI bot blocked: " + req.http.User-Agent);
        return(synth(403, "Forbidden"));
    }
}

Full VCL example

vcl 4.1;

import std;
import vsthrottle;

# Backend definition
backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .connect_timeout = 5s;
    .first_byte_timeout = 30s;
    .between_bytes_timeout = 10s;
    .probe = {
        .url = "/health";
        .timeout = 2s;
        .interval = 5s;
        .window = 5;
        .threshold = 3;
    }
}

# Trusted IPs — bypass bot blocking
acl trusted_crawlers {
    "127.0.0.1";
    "10.0.0.0"/8;
    "192.168.0.0"/16;
}

sub vcl_recv {
    # Health check passthrough
    if (req.url == "/health") {
        return(pass);
    }

    # Serve robots.txt from Varnish directly
    if (req.url == "/robots.txt") {
        return(synth(800, "robots"));
    }

    # Trusted IPs bypass bot blocking
    if (client.ip ~ trusted_crawlers) {
        return(pass);
    }

    # Block AI bots by User-Agent
    if (req.http.User-Agent ~ "(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)") {
        std.log("AI bot blocked UA: " + req.http.User-Agent);
        return(synth(403, "Forbidden"));
    }

    # Rate limiting: 200 req / 10s per IP
    if (vsthrottle.is_denied(req.http.X-Forwarded-For + req.http.User-Agent, 200, 10s)) {
        return(synth(429, "Too Many Requests"));
    }

    # Strip cookies on static assets (allow caching)
    if (req.url ~ ".(css|js|png|jpg|jpeg|gif|ico|woff2?|svg)$") {
        unset req.http.Cookie;
    }

    return(hash);
}

sub vcl_backend_response {
    # Add X-Robots-Tag to all backend responses (cached with object)
    set beresp.http.X-Robots-Tag = "noai, noimageai";

    # Cache static assets for 1 day
    if (bereq.url ~ ".(css|js|png|jpg|jpeg|gif|ico|woff2?|svg)$") {
        set beresp.ttl = 1d;
        set beresp.http.Cache-Control = "public, max-age=86400";
        unset beresp.http.Set-Cookie;
    }
}

sub vcl_deliver {
    # Ensure X-Robots-Tag is on every delivery (including cache hits)
    if (!resp.http.X-Robots-Tag) {
        set resp.http.X-Robots-Tag = "noai, noimageai";
    }

    # Add cache status header for debugging
    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT";
    } else {
        set resp.http.X-Cache = "MISS";
    }

    # Remove Varnish internals from response
    unset resp.http.X-Varnish;
    unset resp.http.Via;
}

sub vcl_synth {
    # robots.txt (custom status 800)
    if (resp.status == 800) {
        set resp.status = 200;
        set resp.http.Content-Type = "text/plain; charset=utf-8";
        set resp.http.Cache-Control = "public, max-age=86400";
        synthetic({"User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

Sitemap: https://example.com/sitemap.xml
"});
        return(deliver);
    }

    # 403 Forbidden
    if (resp.status == 403) {
        set resp.http.Content-Type = "text/plain; charset=utf-8";
        synthetic("Forbidden");
        return(deliver);
    }

    # 429 Too Many Requests
    if (resp.status == 429) {
        set resp.http.Content-Type = "text/plain; charset=utf-8";
        set resp.http.Retry-After = "60";
        synthetic("Too Many Requests");
        return(deliver);
    }

    return(deliver);
}

Custom synth status 800: Using status code 800 for the robots.txt synth avoids conflicting with a real 200 response in vcl_synth. Varnish allows any status code in synth() — using a code outside the standard 200–599 range is a common pattern for internal routing logic. Set it back to 200 in vcl_synth before delivering.

Docker deployment

docker-compose.yml

services:
  varnish:
    image: varnish:7.5-alpine
    ports:
      - "80:80"
      - "8443:8443"
    volumes:
      - ./default.vcl:/etc/varnish/default.vcl:ro
    environment:
      - VARNISH_SIZE=256m
    command: >
      -a 0.0.0.0:80,HTTP
      -f /etc/varnish/default.vcl
      -s malloc,256m
    depends_on:
      - app

  app:
    image: your-app:latest
    expose:
      - "8080"

# For HTTPS: put nginx or caddy in front of varnish for TLS termination
# Varnish does not handle TLS natively in the open-source version

Varnish and TLS: Varnish open source does not terminate TLS. For HTTPS, place a TLS-terminating proxy (nginx, Caddy, HAProxy) in front of Varnish. Varnish Enterprise includes the Hitch TLS proxy. Common pattern: Client → nginx (TLS) → Varnish (cache + bot blocking) → app backend.

Reload VCL without restart

# Load new VCL
varnishadm vcl.load newconfig /etc/varnish/default.vcl

# Activate it
varnishadm vcl.use newconfig

# Verify
varnishadm vcl.list

Inspect blocked requests

# Watch all VCL log messages in real time
varnishlog -g request -q "VCL_Log ~ "AI bot""

# Count blocked bot requests
varnishstat -f MAIN.synth

FAQ

How do I block AI bots by User-Agent in Varnish?

In vcl_recv, use req.http.User-Agent ~ "(?i)(GPTBot|ClaudeBot|...)" then return(synth(403, "Forbidden")). The ~ operator does PCRE regex matching; (?i) makes it case-insensitive.

What is the difference between vcl_recv and vcl_pass in Varnish?

vcl_recv runs for every incoming request before cache lookup — the correct place for bot blocking. vcl_pass runs when a request is explicitly passed to the backend (bypassing cache). Block in vcl_recv so all requests are checked, cached or not.

How do I add X-Robots-Tag in Varnish?

In vcl_backend_response: set beresp.http.X-Robots-Tag = "noai, noimageai" — cached with the object. Or in vcl_deliver: set resp.http.X-Robots-Tag = "noai, noimageai" — applied on every delivery including cache hits, not stored in cache.

Can Varnish serve robots.txt without hitting the backend?

Yes — detect req.url == "/robots.txt" in vcl_recv and call return(synth(800, "robots")). In vcl_synth, set resp.status = 200, set the Content-Type, and use synthetic() with the robots.txt content.

How do I rate-limit bots in Varnish?

Install the varnish-modules package for the vsthrottle VMOD. In vcl_recv: vsthrottle.is_denied(req.http.X-Forwarded-For, 100, 10s) returns true if the client exceeded 100 requests in 10 seconds. Return synth(429) if denied.

Should I block bots in vcl_recv or at the backend level?

Always in vcl_recv — it fires before cache lookup and before any backend connection. Blocking here means zero backend load from blocked bots. Backend-level blocking wastes a connection and thread for every blocked request.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

How to Block AI Bots on Varnish Cache: Complete 2026 Guide

Contents

vcl_recv — block bots before cache lookup

Block and log (using std.log)

vcl_synth — custom 403 response

Return JSON for API consumers

X-Robots-Tag in vcl_backend_response / vcl_deliver

vcl_backend_response — set on backend response (before caching)

vcl_deliver — set on delivery to client (after cache lookup)

Serving robots.txt from VCL

Rate limiting with vsthrottle

Install varnish-modules (Ubuntu/Debian)

Install varnish-modules (from source)

VCL ACL for IP-based exceptions

Full VCL example

Docker deployment

docker-compose.yml

Reload VCL without restart

Inspect blocked requests

FAQ

How do I block AI bots by User-Agent in Varnish?

What is the difference between vcl_recv and vcl_pass in Varnish?

How do I add X-Robots-Tag in Varnish?

Can Varnish serve robots.txt without hitting the backend?

How do I rate-limit bots in Varnish?

Should I block bots in vcl_recv or at the backend level?