How to Block AI Bots on OpenResty (Nginx + Lua): Complete 2026 Guide
OpenResty embeds LuaJIT inside nginx, enabling arbitrary code in request-processing phases. access_by_lua_block runs at the access phase — before proxy_pass. Calling ngx.exit(ngx.HTTP_FORBIDDEN) here means the upstream server never receives the request. Use string.find(ua, pattern, 1, true) — the true flag enables plain-text matching, avoiding Lua regex escaping for hyphens.
OpenResty request phases — bot check goes in access
access_by_lua_block fires before the content phase. ngx.exit(403) here short-circuits all subsequent phases — proxy_pass, header_filter, and body_filter never run. The upstream server never receives the blocked request.
Protection layers
Step 1 — Load bot patterns at startup (init_by_lua_block)
init_by_lua_block runs once in the nginx master process during startup. Globals set here are copy-on-write shared across all worker processes. The bot pattern table is allocated once — not per-request.
# nginx.conf — http block
# init_by_lua_block runs once at nginx start (master process).
# Bot patterns table is available in all worker processes.
http {
lua_package_path '/etc/nginx/lua/?.lua;;';
init_by_lua_block {
AI_BOT_PATTERNS = {
-- OpenAI
"gptbot", "chatgpt-user", "oai-searchbot",
-- Anthropic
"claudebot", "claude-web",
-- Common Crawl
"ccbot",
-- Bytedance
"bytespider",
-- Meta
"meta-externalagent",
-- Perplexity
"perplexitybot",
-- Google AI
"google-extended", "googleother",
-- Cohere
"cohere-ai",
-- Amazon
"amazonbot",
-- Diffbot
"diffbot",
-- AI2
"ai2bot",
-- DeepSeek
"deepseekbot",
-- Mistral
"mistralai-user",
-- xAI
"xai-bot",
-- You.com
"youbot",
-- DuckDuckGo AI
"duckassistbot",
}
}
# ... server blocks below
}Step 2 — Access phase check + response header filter
location = /robots.txt (exact match) has higher nginx priority than location / — it serves robots.txt before any Lua code runs. The access_by_lua_block in location / never sees robots.txt requests.
# nginx.conf — server block
server {
listen 80;
server_name example.com;
# robots.txt — served BEFORE the access_by_lua_block check.
# location = is an exact match — highest priority in nginx.
# AI bots must be able to read robots.txt even if they'd be blocked elsewhere.
location = /robots.txt {
root /var/www/html;
add_header Content-Type "text/plain";
# Do NOT add access_by_lua_block here — allow all crawlers unconditionally.
}
# All other locations — bot check applied
location / {
# access_by_lua_block runs at the ACCESS phase.
# If ngx.exit() is called here, the proxy_pass below NEVER executes.
# The upstream server never receives the blocked request.
access_by_lua_block {
local ua = ngx.var.http_user_agent or ""
local ua_lower = string.lower(ua)
for _, pattern in ipairs(AI_BOT_PATTERNS) do
-- string.find(str, pattern, init, plain=true)
-- plain=true: literal match — hyphens in bot names need no escaping.
if string.find(ua_lower, pattern, 1, true) then
ngx.header["X-Robots-Tag"] = "noai, noimageai"
ngx.log(ngx.WARN, "AI bot blocked: " .. ua)
ngx.exit(ngx.HTTP_FORBIDDEN) -- 403, stops all subsequent phases
end
end
end
# header_filter_by_lua_block runs during the HEADER FILTER phase.
# Only reached for requests that passed the access check above.
# Adds X-Robots-Tag to all legitimate upstream responses.
header_filter_by_lua_block {
ngx.header["X-Robots-Tag"] = "noai, noimageai"
}
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}Step 3 — Separate Lua module (/etc/nginx/lua/ai_bots.lua)
For maintainability, extract the bot list into a separate .lua file. Load it via require "ai_bots" in init_by_lua_block. Set lua_package_path in the http block.
-- /etc/nginx/lua/ai_bots.lua — separate Lua module
-- Loaded with: lua_package_path '/etc/nginx/lua/?.lua;;'
-- In nginx.conf: require "ai_bots" in init_by_lua_block
local M = {}
local patterns = {
"gptbot", "chatgpt-user", "oai-searchbot",
"claudebot", "claude-web",
"ccbot",
"bytespider",
"meta-externalagent",
"perplexitybot",
"google-extended", "googleother",
"cohere-ai",
"amazonbot",
"diffbot",
"ai2bot",
"deepseekbot",
"mistralai-user",
"xai-bot",
"youbot",
"duckassistbot",
}
-- is_ai_bot: returns true if ua matches any known AI bot pattern.
-- ua must be lowercase before calling.
function M.is_ai_bot(ua)
for _, pattern in ipairs(patterns) do
if string.find(ua, pattern, 1, true) then
return true
end
end
return false
end
return M
-- ----------------------------------------------------------------
-- nginx.conf usage:
--
-- init_by_lua_block {
-- ai_bots = require "ai_bots"
-- }
--
-- access_by_lua_block {
-- local ua = string.lower(ngx.var.http_user_agent or "")
-- if ai_bots.is_ai_bot(ua) then
-- ngx.header["X-Robots-Tag"] = "noai, noimageai"
-- ngx.exit(ngx.HTTP_FORBIDDEN)
-- end
-- }Step 4 — Dynamic updates and metrics with lua_shared_dict
lua_shared_dict is a shared memory zone readable and writable from all worker processes atomically. Use it to track block counts per bot and to add new bot patterns at runtime without an nginx reload.
# lua_shared_dict — dynamic bot list + metrics without nginx restart
http {
# Shared memory zones — readable/writable from ALL worker processes.
# Survives worker restarts (not nginx -s reload of config).
lua_shared_dict bot_metrics 10m; -- counters per bot name
lua_shared_dict dynamic_bots 1m; -- runtime-added patterns
init_by_lua_block {
-- Static patterns (compile-time)
AI_BOT_PATTERNS = { "gptbot", "claudebot", "ccbot", ... }
}
server {
location / {
access_by_lua_block {
local ua = string.lower(ngx.var.http_user_agent or "")
-- Check static patterns
for _, pattern in ipairs(AI_BOT_PATTERNS) do
if string.find(ua, pattern, 1, true) then
-- Increment counter in shared dict (atomic)
local metrics = ngx.shared.bot_metrics
metrics:incr(pattern, 1, 0)
ngx.header["X-Robots-Tag"] = "noai, noimageai"
ngx.exit(ngx.HTTP_FORBIDDEN)
end
end
-- Check dynamic patterns added at runtime via /admin endpoint
local dynamic = ngx.shared.dynamic_bots
local keys = dynamic:get_keys()
for _, key in ipairs(keys) do
if string.find(ua, key, 1, true) then
ngx.header["X-Robots-Tag"] = "noai, noimageai"
ngx.exit(ngx.HTTP_FORBIDDEN)
end
end
end
proxy_pass http://backend;
}
# Admin endpoint to add dynamic bot patterns at runtime
# (Protect this with allow/deny or authentication in production)
location = /admin/block-bot {
allow 127.0.0.1;
deny all;
content_by_lua_block {
local pattern = ngx.var.arg_pattern
if pattern and #pattern > 0 then
ngx.shared.dynamic_bots:set(pattern, true)
ngx.say("blocked: " .. pattern)
else
ngx.status = 400
ngx.say("missing ?pattern=")
end
}
}
}
}Step 5 — Docker deployment
The official openresty/openresty Docker image includes LuaJIT and all standard OpenResty modules. Mount your config and Lua files as volumes for live editing without rebuilding.
# Dockerfile — OpenResty with custom Lua config
FROM openresty/openresty:1.25.3-bookworm
# Copy nginx config and Lua modules
COPY nginx.conf /etc/nginx/nginx.conf
COPY lua/ /etc/nginx/lua/
COPY html/ /var/www/html/
EXPOSE 80
CMD ["/usr/local/openresty/nginx/sbin/nginx", "-g", "daemon off;"]
# ----------------------------------------------------------------
# docker-compose.yml
# version: "3.9"
# services:
# openresty:
# build: .
# ports:
# - "80:80"
# volumes:
# - ./nginx.conf:/etc/nginx/nginx.conf:ro
# - ./lua:/etc/nginx/lua:ro
# - ./html:/var/www/html:ro
# ----------------------------------------------------------------
# robots.txt — /var/www/html/robots.txt
# User-agent: *
# Allow: /
#
# User-agent: GPTBot
# Disallow: /
#
# User-agent: ClaudeBot
# Disallow: /
#
# User-agent: CCBot
# Disallow: /
#
# User-agent: Bytespider
# Disallow: /
#
# User-agent: Google-Extended
# Disallow: /
#
# User-agent: PerplexityBot
# Disallow: /
#
# User-agent: Meta-ExternalAgent
# Disallow: /OpenResty vs plain Nginx vs Nginx Unit vs Caddy
| Feature | OpenResty | Plain Nginx | Nginx Unit | Caddy |
|---|---|---|---|---|
| Bot check mechanism | access_by_lua_block — LuaJIT code iterates pattern list, string.find plain-text match | map $http_user_agent $is_bot { ... } + if ($is_bot) { return 403; } — static config only | Python/Ruby/PHP handler script called per request via unit config routing | header_regexp matcher + respond directive, or Caddy Lua module (less common) |
| Short-circuit | ngx.exit(ngx.HTTP_FORBIDDEN) at access phase — proxy_pass never executes | return 403 in if block — but if directive in nginx is often fragile | HTTP 403 response returned from application handler | respond 403 directive in Caddyfile — before reverse_proxy |
| X-Robots-Tag | header_filter_by_lua_block on pass-through; ngx.header[] in access block for 403 | add_header X-Robots-Tag "noai, noimageai" always — applies to all responses | Set via application framework response headers | header X-Robots-Tag "noai, noimageai" directive in Caddyfile |
| Dynamic bot list | lua_shared_dict — update at runtime without reload, atomic incr for metrics | Requires nginx -s reload to pick up map block changes | Reload unit config via REST API; application code can read dynamic sources | Requires config reload via Admin API or caddy reload |
| robots.txt serving | location = /robots.txt { root ...; } — exact match before Lua location, no access check | location = /robots.txt { root ...; } — identical, no Lua needed | Configured as a static route in unit config or served by application | file_server for /robots.txt before reverse_proxy block |
| Lua pattern matching | string.find(ua, pattern, 1, true) — plain=true avoids % escaping for hyphens | PCRE regex in map block — ~* for case-insensitive; hyphens safe in character classes | Language-native string matching in application code | header_regexp uses Go regexp2 — hyphens in character classes are safe |
| Performance | LuaJIT — near-native speed, JIT-compiled Lua, minimal overhead per request | Fastest — pure C, no scripting overhead; limited flexibility | Language startup overhead; Go/Python/Ruby runtimes; higher memory per worker | Go runtime — fast, GC pauses possible at scale; simpler ops than OpenResty |
Summary
- access phase — before upstream —
access_by_lua_blockruns beforeproxy_pass. Blocked requests never reach the upstream server. string.find(ua, pattern, 1, true)— thetrueflag is plain-text matching. Bot names with hyphens (chatgpt-user, meta-externalagent) are matched literally without%-escaping.init_by_lua_block— allocates the bot pattern table once at startup, copy-on-write shared across all workers. Not per-request allocation.location = /robots.txt— nginx exact match has higher priority thanlocation /. robots.txt is always served without hitting any Lua code.lua_shared_dict— for runtime updates and metrics without nginx reload. Atomic operations across all worker processes.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.