How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Lighttpd · Web Server · VPS · Embedded Linux8 min read

How to Block AI Bots on Lighttpd: Complete 2026 Guide

Lighttpd ("lighty") is a lightweight, event-driven web server optimised for high-concurrency, low-memory environments — popular on VPS instances, Raspberry Pi, and embedded Linux. Its module-based config uses conditional blocks that make bot blocking concise and readable.

Required modules
User-Agent blocking with mod_access
X-Robots-Tag with mod_setenv
robots.txt as a static file
Conditional blocks — path-specific rules
Rate limiting options
Full lighttpd.conf example
Docker deployment
FAQ

Required modules

Lighttpd uses a module system. All modules must be listed in server.modules in lighttpd.conf before their directives can be used. The relevant modules for bot blocking are bundled with Lighttpd — no separate installation required:

server.modules = (
    "mod_access",      # url.access-deny — required for UA blocking
    "mod_setenv",      # setenv.add-response-header — required for X-Robots-Tag
    "mod_rewrite",     # url.rewrite-once — optional, for URL manipulation
    "mod_redirect",    # url.redirect — optional
    "mod_accesslog",   # access logging
    "mod_fastcgi",     # if using PHP/Python via FastCGI
    "mod_proxy",       # if using as reverse proxy
)

Order matters: Modules are loaded in the order listed. mod_access should be early in the list — it processes requests before they reach content handlers. If you use mod_rewrite, place it before mod_access if rewrites should happen before access checks, or after if access checks should fire first.

User-Agent blocking with mod_access

Lighttpd uses conditional blocks ($HTTP["useragent"]) to match request headers. Inside a match, url.access-deny denies the request with a 403.

# lighttpd.conf
# Block AI training and scraping bots by User-Agent
$HTTP["useragent"] =~ "GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot" {
    url.access-deny = ("")
}

Regex matching: The =~ operator does POSIX extended regex matching. Alternatives separated by |. The match is case-sensitive by default — use =~* for case-insensitive matching (Lighttpd 1.4.46+), or add (?i) at the start of the pattern for older versions.

Case-insensitive matching (Lighttpd 1.4.46+)

# =~* operator for case-insensitive regex (Lighttpd 1.4.46+)
$HTTP["useragent"] =~* "gptbot|claudebot|anthropic-ai|ccbot|google-extended|ahrefsbot|bytespider|amazonbot|diffbot|facebookbot|cohere-ai|perplexitybot|youbot" {
    url.access-deny = ("")
}

Case-insensitive matching (older Lighttpd)

# (?i) flag for case-insensitive match (PCRE, older versions)
$HTTP["useragent"] =~ "(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)" {
    url.access-deny = ("")
}

url.access-deny = (""): The empty string matches all URLs. This denies the entire request with a 403 Forbidden response. You can also deny specific paths: url.access-deny = ("/api/", "/admin/") — but for bot blocking, denying everything is the correct approach.

X-Robots-Tag with mod_setenv

Use mod_setenv to add response headers globally. Place this after themod_setenv entry in server.modules:

# lighttpd.conf
setenv.add-response-header = (
    "X-Robots-Tag" => "noai, noimageai"
)

Multiple headers

setenv.add-response-header = (
    "X-Robots-Tag"           => "noai, noimageai",
    "X-Content-Type-Options" => "nosniff",
    "X-Frame-Options"        => "SAMEORIGIN",
    "Referrer-Policy"        => "strict-origin-when-cross-origin"
)

setenv.add-response-header vs setenv.set-response-header: add-response-header appends to any existing header with the same name (can create duplicates if the upstream also sets it). set-response-header (Lighttpd 1.4.46+) replaces the existing value. For X-Robots-Tag, prefer set-response-header if your backend might also set it.

robots.txt as a static file

Place robots.txt in your document root (configured by server.document-root in lighttpd.conf). Lighttpd serves all static files from the document root by default — no additional configuration needed.

# lighttpd.conf
server.document-root = "/var/www/html"
# robots.txt goes at: /var/www/html/robots.txt

# /var/www/html/robots.txt
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Conditional blocks — path-specific rules

Lighttpd's conditional blocks can be nested. Block AI bots only on specific paths (e.g. protect an API or blog while allowing crawling of marketing pages):

Block bots site-wide (recommended)

$HTTP["useragent"] =~ "GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended" {
    url.access-deny = ("")
}

Block bots only under /blog/ and /docs/

$HTTP["url"] =~ "^/(blog|docs)/" {
    $HTTP["useragent"] =~ "GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended" {
        url.access-deny = ("")
    }
}

Allow a specific bot while blocking others

# Block all AI bots EXCEPT Googlebot
$HTTP["useragent"] =~ "GPTBot|ClaudeBot|anthropic-ai|CCBot|AhrefsBot|Bytespider" {
    $HTTP["useragent"] !~ "Googlebot" {
        url.access-deny = ("")
    }
}

Conditional operators:

=~ — matches regex (case-sensitive)
=~* — matches regex (case-insensitive, 1.4.46+)
!~ — does not match regex
== — exact string match
!= — not equal

Conditions can be nested up to 3 levels deep.

Rate limiting options

Lighttpd does not have built-in request rate limiting like nginx or HAProxy. Options:

Option 1: Connection limiting (built-in)

# Limit concurrent connections per IP
server.max-connections = 1024

# Per-IP connection limiting (Lighttpd 1.4.46+)
$HTTP["remoteip"] !~ "^(127\.0\.0\.1|10\.).*" {
    connection.limit = 20
}

Option 2: iptables rate limiting (OS level)

# Limit each IP to 60 new connections per minute to port 80/443
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -m limit --limit 60/min --limit-burst 20 -j ACCEPT
iptables -A INPUT -p tcp --dport 80 -m state --state NEW -j DROP
ip6tables -A INPUT -p tcp --dport 443 -m state --state NEW -m limit --limit 60/min --limit-burst 20 -j ACCEPT

Option 3: fail2ban integration

# /etc/fail2ban/filter.d/lighttpd-bot.conf
[Definition]
failregex = ^<HOST> .* "(GET|POST|HEAD) .* HTTP/.*" 403
ignoreregex =

# /etc/fail2ban/jail.local
[lighttpd-bot]
enabled  = true
port     = http,https
filter   = lighttpd-bot
logpath  = /var/log/lighttpd/access.log
maxretry = 10
findtime = 60
bantime  = 3600

Full lighttpd.conf example

# /etc/lighttpd/lighttpd.conf

server.modules = (
    "mod_access",
    "mod_setenv",
    "mod_accesslog",
    "mod_rewrite",
    "mod_redirect",
    "mod_compress",
    "mod_fastcgi",
)

# Basic server config
server.document-root = "/var/www/html"
server.port          = 80
server.bind          = "0.0.0.0"
server.username      = "www-data"
server.groupname     = "www-data"
server.pid-file      = "/run/lighttpd.pid"
server.errorlog      = "/var/log/lighttpd/error.log"

# Access logging
accesslog.filename   = "/var/log/lighttpd/access.log"
accesslog.format     = "%h %V %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i""

# MIME types
mimetype.assign = (
    ".html"  => "text/html; charset=utf-8",
    ".css"   => "text/css",
    ".js"    => "application/javascript",
    ".json"  => "application/json",
    ".png"   => "image/png",
    ".jpg"   => "image/jpeg",
    ".svg"   => "image/svg+xml",
    ".woff2" => "font/woff2",
    ".txt"   => "text/plain",
    ".xml"   => "application/xml",
)

# Index files
index-file.names = ("index.html", "index.php")

# ── Bot blocking ─────────────────────────────────────────────────────────────

# Block AI training and scraping bots by User-Agent
$HTTP["useragent"] =~ "(?i)(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot)" {
    url.access-deny = ("")
}

# ── Response headers ─────────────────────────────────────────────────────────

setenv.add-response-header = (
    "X-Robots-Tag"           => "noai, noimageai",
    "X-Content-Type-Options" => "nosniff",
    "X-Frame-Options"        => "SAMEORIGIN",
    "Referrer-Policy"        => "strict-origin-when-cross-origin",
)

# ── Static file caching ───────────────────────────────────────────────────────

$HTTP["url"] =~ ".(css|js|png|jpg|jpeg|gif|ico|woff2|svg)$" {
    expire.url = ( "" => "access plus 1 months" )
}

# ── HTTPS redirect (if handling SSL offload) ──────────────────────────────────
# Typically done at the load balancer/proxy level
# $HTTP["scheme"] == "http" {
#     url.redirect = ( "^/(.*)" => "https://example.com/$1" )
# }

Test config and reload

# Test config syntax
lighttpd -t -f /etc/lighttpd/lighttpd.conf

# Reload (graceful — no dropped connections)
systemctl reload lighttpd

# Or send HUP signal
kill -HUP $(cat /run/lighttpd.pid)

Docker deployment

docker-compose.yml

services:
  lighttpd:
    image: sebp/lighttpd:latest
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./lighttpd.conf:/etc/lighttpd/lighttpd.conf:ro
      - ./html:/var/www/html:ro
      - ./ssl:/etc/lighttpd/ssl:ro
    restart: unless-stopped

Minimal Dockerfile

FROM debian:bookworm-slim

RUN apt-get update && apt-get install -y lighttpd && rm -rf /var/lib/apt/lists/*

COPY lighttpd.conf /etc/lighttpd/lighttpd.conf
COPY html/ /var/www/html/

EXPOSE 80

CMD ["lighttpd", "-D", "-f", "/etc/lighttpd/lighttpd.conf"]

FAQ

How do I block AI bots by User-Agent in Lighttpd?

Use mod_access with a $HTTP["useragent"] conditional block: $HTTP["useragent"] =~ "GPTBot|ClaudeBot|..." { url.access-deny = ("") }. The =~ operator does regex matching; =~* is case-insensitive (Lighttpd 1.4.46+).

What modules do I need to block AI bots in Lighttpd?

mod_access for url.access-deny and mod_setenv for setenv.add-response-header. Both are bundled with Lighttpd — just add them to server.modules in lighttpd.conf.

How do I add X-Robots-Tag in Lighttpd?

setenv.add-response-header = ("X-Robots-Tag" => "noai, noimageai") after loading mod_setenv. Use set-response-header (1.4.46+) instead of add-response-header if your backend might also set it, to avoid duplicates.

How do I serve robots.txt in Lighttpd?

Place robots.txt in server.document-root. Lighttpd serves static files from the document root automatically — no extra config needed.

Does Lighttpd support rate limiting?

Not built-in for request rate limiting. Options: connection limiting with connection.limit (per-IP, 1.4.46+), OS-level iptables rate limiting, or fail2ban parsing access logs. For advanced rate limiting, put Cloudflare or nginx in front of Lighttpd.

Can I use conditional blocks to block bots on specific paths?

Yes — nest conditions: $HTTP["url"] =~ "^/blog/" { $HTTP["useragent"] =~ "GPTBot" { url.access-deny = ("") } }. Conditions can be nested up to 3 levels deep. Use !~ to invert (block everything except a pattern).

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

How to Block AI Bots on Lighttpd: Complete 2026 Guide

Contents

Required modules

User-Agent blocking with mod_access

Case-insensitive matching (Lighttpd 1.4.46+)

Case-insensitive matching (older Lighttpd)

X-Robots-Tag with mod_setenv

Multiple headers

robots.txt as a static file

Conditional blocks — path-specific rules

Block bots site-wide (recommended)

Block bots only under /blog/ and /docs/

Allow a specific bot while blocking others

Rate limiting options

Option 1: Connection limiting (built-in)

Option 2: iptables rate limiting (OS level)

Option 3: fail2ban integration

Full lighttpd.conf example

Test config and reload

Docker deployment

docker-compose.yml

Minimal Dockerfile

FAQ

How do I block AI bots by User-Agent in Lighttpd?

What modules do I need to block AI bots in Lighttpd?

How do I add X-Robots-Tag in Lighttpd?

How do I serve robots.txt in Lighttpd?

Does Lighttpd support rate limiting?

Can I use conditional blocks to block bots on specific paths?