How to Block AI Bots on Caddy: Complete 2026 Guide
Caddy's named matchers and clean Caddyfile syntax make bot blocking concise: define a @bad_bot matcher on User-Agent, then respond @bad_bot 403. Caddy's automatic HTTPS via Let's Encrypt means no SSL config — just point a domain at your server. The key behaviour to understand: Caddy evaluates directives by precedence, not top-to-bottom — respond always fires before reverse_proxy regardless of order in the Caddyfile.
Directive order ≠ execution order in Caddy
Unlike nginx and Apache, Caddy evaluates directives in a fixed precedence order, not top-to-bottom. You don't need to put respond @bad_bot 403 before reverse_proxy in the Caddyfile — Caddy's precedence list ensures respond is evaluated first. But writing it first is still good practice for readability.
Methods at a glance
| Method | What it does | Caddyfile directive |
|---|---|---|
| robots.txt via file_server | Signals bots which paths are off-limits | file_server |
| @bad_bot header matcher + respond 403 | Hard block on known AI User-Agents | respond @bad_bot 403 |
| header X-Robots-Tag | noai header on all HTTP responses | header |
| noai <meta> tag | AI training opt-out per HTML page | HTML files / SSG layout |
| caddy-ratelimit plugin | Rate-limit to catch UA-rotating bots | rate_limit (plugin) |
| reverse_proxy | Proxy to backend after bot check | reverse_proxy |
1. robots.txt — file_server
Place robots.txt in the directory set by the root directive. Caddy's file_server serves it automatically at /robots.txt — no location block needed.
# Caddyfile
example.com {
root * /var/www/html
file_server
# robots.txt is served automatically from root directory
# No special config needed
}# /var/www/html/robots.txt
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: *
Allow: /2. Hard 403 — header matcher
Define a named matcher (@bad_bot) using the header matcher on User-Agent. Multiple header User-Agent lines inside the same named matcher are evaluated as OR conditions. The *wildcard* syntax matches the pattern anywhere in the header value (case-sensitive by default — use the header_regexp matcher for case-insensitive matching).
example.com {
root * /var/www/html
# Named matcher — multiple header lines = OR conditions
@bad_bot {
header User-Agent *GPTBot*
header User-Agent *ChatGPT-User*
header User-Agent *ClaudeBot*
header User-Agent *Claude-Web*
header User-Agent *anthropic-ai*
header User-Agent *CCBot*
header User-Agent *Google-Extended*
header User-Agent *PerplexityBot*
header User-Agent *Amazonbot*
header User-Agent *Bytespider*
header User-Agent *YouBot*
header User-Agent *Applebot*
header User-Agent *DuckAssistBot*
header User-Agent *meta-externalagent*
header User-Agent *MistralAI-Spider*
header User-Agent *oai-searchbot*
}
# Return 403 for matched bots
# Caddy evaluates respond before reverse_proxy/file_server regardless of order
respond @bad_bot 403
file_server
}header vs header_regexp matcher
header User-Agent *GPTBot*— glob-style wildcard, case-sensitiveheader_regexp bot User-Agent (?i)(GPTBot|ClaudeBot)— regex with named capture group,(?i)for case-insensitive
For a long list of bots, header_regexp with a single pipe-separated regex is more concise. The bot name is a capture group label used internally by Caddy.
Alternative using header_regexp for a more compact single-matcher approach:
example.com {
@bad_bot header_regexp bot User-Agent (?i)(GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot)
respond @bad_bot 403
file_server
}3. X-Robots-Tag — header directive
Caddy's header directive adds, sets, or removes response headers. Place it in the site block for all responses. Unlike nginx's always keyword, Caddy adds headers to all responses including error responses by default.
example.com {
# Add X-Robots-Tag to all responses (including 4xx/5xx)
header X-Robots-Tag "noai, noimageai"
# Or add multiple security headers at once:
header {
X-Robots-Tag "noai, noimageai"
X-Frame-Options "SAMEORIGIN"
X-Content-Type-Options "nosniff"
}
# Scope to HTML responses only (using path matcher):
@html path *.html /
header @html X-Robots-Tag "noai, noimageai"
}header vs header_down for reverse proxy
When using reverse_proxy, the header directive applies to Caddy's response to the client. If your upstream already sets X-Robots-Tag and you want to override it, use header_down inside the reverse_proxy block to modify the upstream response before Caddy forwards it.
4. noai meta tag — static sites
Caddy serves HTML files as-is — it does not inject content. Add the noai meta tag to your HTML files directly, or to your SSG base layout template.
<!-- In your HTML <head> -->
<meta name="robots" content="noai, noimageai">
<!-- Hugo base layout (layouts/_default/baseof.html): -->
<meta name="robots" content="{{ with .Params.robots }}{{ . }}{{ else }}noai, noimageai{{ end }}">
<!-- Eleventy base layout (_includes/base.njk): -->
<meta name="robots" content="{{ robots | default('noai, noimageai') }}">Use X-Robots-Tag (Section 3) as the HTTP-layer equivalent — no HTML edits needed.
5. Rate limiting — caddy-ratelimit plugin
Caddy does not include rate limiting in the standard distribution. Install the caddy-ratelimit plugin by building Caddy with xcaddy, or use Cloudflare in front of Caddy for free rate limiting without a custom build.
# Build Caddy with rate limit plugin:
go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest
xcaddy build --with github.com/mholt/caddy-ratelimit
# Caddyfile with rate limiting:
example.com {
@bad_bot header_regexp bot User-Agent (?i)(GPTBot|ClaudeBot|CCBot|PerplexityBot|Bytespider)
respond @bad_bot 403
# Rate limit: 10 requests per second per IP
rate_limit {
zone dynamic {
key {remote_host}
events 10
window 1s
}
}
file_server
}Without the plugin, use Caddy's reverse_proxy health checks and upstream load balancing to mitigate scraping — or proxy through Cloudflare.
6. Reverse proxy setup
Caddy's reverse_proxy directive forwards requests to an upstream server. The @bad_bot matcher and respond 403 fire before proxying — blocked bots never reach your backend.
example.com {
# Bot check — fires before reverse_proxy due to directive precedence
@bad_bot header_regexp bot User-Agent (?i)(GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot)
respond @bad_bot 403
# X-Robots-Tag on all responses
header X-Robots-Tag "noai, noimageai"
# Serve robots.txt locally — don't proxy it
handle /robots.txt {
root * /var/www/html
file_server
}
# Proxy everything else to backend
reverse_proxy localhost:3000 {
# Pass real client IP to upstream
header_up X-Real-IP {remote_host}
header_up X-Forwarded-For {remote_host}
header_up X-Forwarded-Proto {scheme}
}
}7. Full Caddyfile example
A complete production Caddyfile with automatic HTTPS, bot blocking, X-Robots-Tag, and both static site and reverse proxy patterns. Caddy handles Let's Encrypt automatically — no SSL config needed.
# Caddyfile — production ready
# Global options (optional)
{
email admin@example.com # Let's Encrypt notifications
# admin off # Disable admin API in production
}
example.com {
# ── AI bot blocking ────────────────────────────────────────────────
@bad_bot header_regexp bot User-Agent (?i)(GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai|CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider|YouBot|Applebot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot)
respond @bad_bot 403
# ── AI training opt-out header ─────────────────────────────────────
header X-Robots-Tag "noai, noimageai"
# ── Static site (comment out if using reverse_proxy) ───────────────
root * /var/www/html
file_server
# ── Reverse proxy (comment out file_server above, uncomment below) ─
# handle /robots.txt {
# root * /var/www/html
# file_server
# }
# reverse_proxy localhost:3000 {
# header_up X-Real-IP {remote_host}
# header_up X-Forwarded-For {remote_host}
# header_up X-Forwarded-Proto {scheme}
# }
# ── Logging ────────────────────────────────────────────────────────
log {
output file /var/log/caddy/access.log
format json
}
}
# Redirect www → apex
www.example.com {
redir https://example.com{uri} permanent
}8. Docker deployment
The official caddy Docker image includes automatic HTTPS. Mount your Caddyfile and a data volume (for certificates) — Caddy handles everything else.
# docker-compose.yml
services:
caddy:
image: caddy:alpine
ports:
- "80:80"
- "443:443"
- "443:443/udp" # HTTP/3
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- ./dist:/var/www/html:ro
- caddy_data:/data # TLS certificates — persist this!
- caddy_config:/config
restart: unless-stopped
volumes:
caddy_data:
caddy_config:# Dockerfile — custom build with plugins
FROM caddy:builder AS builder
RUN xcaddy build \
--with github.com/mholt/caddy-ratelimit
FROM caddy:alpine
COPY --from=builder /usr/bin/caddy /usr/bin/caddy
COPY Caddyfile /etc/caddy/Caddyfile
COPY dist/ /var/www/html/# Reload config without downtime:
docker exec caddy-container caddy reload --config /etc/caddy/Caddyfile
# Validate config:
docker exec caddy-container caddy validate --config /etc/caddy/Caddyfile9. Dynamic updates via JSON API
Caddy's admin API accepts live config updates over HTTP — no restart needed. This lets you add bots to a blocklist programmatically. The admin endpoint runs on localhost:2019 by default.
# Get current config:
curl http://localhost:2019/config/
# Reload from Caddyfile (converts to JSON internally):
curl -X POST http://localhost:2019/load \
-H "Content-Type: text/caddyfile" \
--data-binary @Caddyfile
# Add a new bot pattern to the regexp matcher via JSON API:
# (Advanced — use only if you need dynamic updates without Caddyfile reload)
curl -X PATCH http://localhost:2019/config/apps/http/servers/srv0/routes \
-H "Content-Type: application/json" \
-d '{"@id":"bot-block","match":[{"header_regexp":{"User-Agent":{"pattern":"NewBotName","name":"bot"}}}],"handle":[{"handler":"static_response","status_code":403}]}'Disable the admin API in production if you don't need live updates: { admin off } in the global options block.
Frequently asked questions
How do I match User-Agent in Caddy?
Use the header matcher: header User-Agent *GPTBot*. The * wildcard matches anywhere in the string (case-sensitive). For case-insensitive matching across many bots, use header_regexp with a single pipe-separated pattern: header_regexp bot User-Agent (?i)(GPTBot|ClaudeBot|CCBot). Multiple header User-Agent lines in one named matcher are OR conditions.
Does Caddy serve robots.txt automatically?
Yes — with file_server enabled and robots.txt in the root directory, Caddy serves it at /robots.txt with no additional config. No special location block required.
How does Caddy directive ordering work?
Caddy evaluates directives in a fixed precedence order, not top-to-bottom. respond has higher precedence than reverse_proxy and file_server, so respond @bad_bot 403 always fires first regardless of where you write it. Writing it first is still recommended for readability.
How do I add X-Robots-Tag in Caddy?
header X-Robots-Tag "noai, noimageai" in the site block. Caddy adds headers to all responses including error responses by default — no always keyword needed (unlike nginx). For reverse proxy, use header_down inside the reverse_proxy block to modify upstream response headers.
Does Caddy have built-in rate limiting?
No — rate limiting requires the caddy-ratelimit plugin, installed by building Caddy with xcaddy build --with github.com/mholt/caddy-ratelimit. The standard Docker image and binary do not include it. Alternatively, put Caddy behind Cloudflare for free rate limiting without a custom build.
What is the difference between Caddyfile and JSON config?
The Caddyfile is a human-readable format Caddy converts to JSON internally. JSON config is Caddy's native format — it exposes every option and can be updated live via the admin API (POST localhost:2019/load) without restarting. Use Caddyfile for simplicity; use the JSON API for programmatic config updates (e.g. dynamically adding bots to a blocklist). Disable the admin API in production if you don't need it: { admin off }.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.