How to Block AI Bots in Lua Lapis
Lapis is a Lua web framework that runs on OpenResty (nginx + LuaJIT), providing a higher-level MVC abstraction over raw OpenResty. It supports both Lua and MoonScript and includes a routing DSL, database models, sessions, and a before_filter system that fires before every action. Lapis sits above the OpenResty/nginx layer — requests reach Lapis only after nginx has processed the connection and executed its rewrite/access phases. The Lapis-specific detail for bot blocking: returning a table from before_filter short-circuits the action. Any non-nil, non-false return value is used as the response — the action function never runs. The table can include a headers key to set response headers on the blocked response in a single return.
1. Bot detection module
A Lua module with no external dependencies. string.find(pattern, 1, true) — the true fourth argument enables plain mode (literal substring, no Lua pattern engine). Applied to the lowercased UA string via :lower().
-- bot_utils.lua — bot detection, no dependencies
local M = {}
-- All lowercase — matched against ua:lower()
local AI_BOT_PATTERNS = {
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
}
--- Returns true if the User-Agent string matches a known AI crawler.
function M.is_ai_bot(ua)
if not ua or ua == "" then return false end
local lower = ua:lower()
-- string.find() with plain=true — literal substring, no regex engine
for _, pattern in ipairs(AI_BOT_PATTERNS) do
if lower:find(pattern, 1, true) then
return true
end
end
return false
end
return M2. before_filter — global bot blocking
app:before_filter(fn) registers a function that runs before every action. Return a table to short-circuit; return nil (or nothing) to pass through. The headers key in the return table sets response headers on the blocked response. self.res.headers sets headers on passing responses.
-- app.lua — Lapis application
local lapis = require("lapis")
local bot_utils = require("bot_utils")
local app = lapis.Application()
-- ── Global before_filter ──────────────────────────────────────────────────────
-- before_filter fires before every action in this application.
-- Return a table to short-circuit the action and use it as the response.
-- Return nil (or nothing) to continue to the action.
app:before_filter(function(self)
-- Path guard: let robots.txt through.
-- In most deployments, nginx serves robots.txt as a static file before
-- Lapis runs. This guard handles edge cases where it reaches Lapis.
if self.req.cmd_url == "/robots.txt" then
return -- nil = pass through
end
-- Lapis normalises request header names to lowercase.
-- self.req.headers["user-agent"] is nil when absent.
local ua = self.req.headers["user-agent"] or ""
if bot_utils.is_ai_bot(ua) then
-- Return a table to short-circuit. Any non-nil, non-false return
-- from before_filter causes Lapis to render it as the response.
-- 'headers' key sets response headers on the blocked response.
return {
status = 403,
headers = {
["X-Robots-Tag"] = "noai, noimageai",
["Content-Type"] = "text/plain",
},
"Forbidden",
}
end
-- Pass-through: set X-Robots-Tag on all non-blocked responses.
-- self.res.headers is the response headers table for the current request.
self.res.headers["X-Robots-Tag"] = "noai, noimageai"
end)
-- ── Routes ────────────────────────────────────────────────────────────────────
app:get("/", function(self)
return { json = { message = "Hello" } }
end)
app:get("/api/data", function(self)
return { json = { data = "value" } }
end)
app:get("/health", function(self)
return { json = { status = "ok" } }
end)
return app3. MoonScript variant
MoonScript is Lapis's preferred language — cleaner syntax, class-based application definition. The @before_filter class method registers the filter. Inside the filter, @ is self — @req, @res, and other Lapis properties are accessed with the fat arrow (=>) syntax.
-- app.moon — MoonScript variant (Lapis's native language)
-- MoonScript compiles to Lua; the pattern is identical but with cleaner syntax.
lapis = require "lapis"
bot_utils = require "bot_utils"
class App extends lapis.Application
@before_filter =>
return if @req.cmd_url == "/robots.txt"
ua = @req.headers["user-agent"] or ""
if bot_utils.is_ai_bot ua
return {
status: 403
headers: {
"X-Robots-Tag": "noai, noimageai"
"Content-Type": "text/plain"
}
"Forbidden"
}
@res.headers["X-Robots-Tag"] = "noai, noimageai"
[index]: =>
json: { message: "Hello" }
["/api/data"]: =>
json: { data: "value" }4. Scoped protection — action wrapper
When some routes should bypass the filter (health checks, public endpoints), use an action wrapper function instead of a global filter with path guards. The wrapper is applied only to actions that need protection — no special cases needed in the filter logic.
-- Scoped before_filter using respond_to — protect only specific routes
-- Use this when some routes (health check, public API) should bypass the filter.
local lapis = require("lapis")
local bot_utils = require("bot_utils")
local app = lapis.Application()
-- Helper: wrap an action with bot blocking
local function protected(action)
return function(self)
local ua = self.req.headers["user-agent"] or ""
if bot_utils.is_ai_bot(ua) then
return {
status = 403,
headers = { ["X-Robots-Tag"] = "noai, noimageai" },
"Forbidden",
}
end
self.res.headers["X-Robots-Tag"] = "noai, noimageai"
return action(self)
end
end
-- Public endpoint — no filter
app:get("/health", function(self)
return { json = { status = "ok" } }
end)
-- Protected endpoint — bot filter applied via wrapper
app:get("/api/data", protected(function(self)
return { json = { data = "value" } }
end))5. Direct OpenResty access via ngx.var
Inside a Lapis before_filter, you can access OpenResty's ngx API directly. ngx.var.http_user_agent is the raw nginx variable — nginx normalises header names to lowercase and replaces hyphens with underscores when building variable names. If you call ngx.exit() inside Lapis, it bypasses the Lapis response pipeline entirely — prefer table-return for Lapis applications.
-- Lower-level: ngx.var.http_user_agent (direct OpenResty access)
-- Use this inside a before_filter when you want the raw nginx variable
-- rather than the Lapis-normalised header. Slightly faster.
-- Note: ngx.var.http_user_agent returns nil when the header is absent.
app:before_filter(function(self)
if self.req.cmd_url == "/robots.txt" then return end
-- ngx.var.http_user_agent: nginx converts header name to lowercase,
-- replaces hyphens with underscores, and prefixes with http_.
-- "User-Agent" → ngx.var.http_user_agent
local ua = ngx.var.http_user_agent or ""
if bot_utils.is_ai_bot(ua) then
-- ngx.exit() is the raw OpenResty approach — bypasses Lapis response pipeline.
-- Lapis table-return is preferred inside Lapis applications.
-- Use ngx.exit() only if you need to abort before Lapis processes anything.
ngx.status = 403
ngx.header["X-Robots-Tag"] = "noai, noimageai"
ngx.header["Content-Type"] = "text/plain"
ngx.say("Forbidden")
ngx.exit(ngx.HTTP_FORBIDDEN)
end
self.res.headers["X-Robots-Tag"] = "noai, noimageai"
end)6. nginx.conf — static robots.txt + Lapis routing
Configure nginx to serve robots.txt as a static file before Lapis handles the request. The Lapis before_filter never fires for statically-served files. All other requests are passed to Lapis via content_by_lua_block.
# nginx.conf — OpenResty configuration for Lapis
# Place in the http block, inside a server block.
server {
listen 8080;
server_name example.com;
# Serve robots.txt as a static file — Lapis never runs for it.
# This is the recommended approach: before_filter does not fire.
location = /robots.txt {
root /var/www/html;
add_header X-Robots-Tag "noai, noimageai";
try_files $uri =404;
}
# All other requests: route through Lapis
location / {
# content_by_lua_block runs Lapis
content_by_lua_block {
require("lapis").serve("app")
}
}
}7. robots.txt
# static/robots.txt (served by nginx before Lapis)
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /Key points
- Table-return short-circuits the action: Any non-nil, non-false return value from
before_filtercauses Lapis to use it as the response — the action function never runs. Returnnil(implicitly, by falling off the end of the function) to pass through. - headers key in the return table: Include
headers = { ["X-Robots-Tag"] = "..." }in the return table to set response headers on the blocked response in a single return statement. - Request headers are lowercased; response headers are not: Lapis normalises incoming header names to lowercase —
self.req.headers["user-agent"]. Response headers set viaself.res.headers["X-Robots-Tag"]are sent as-is; use the canonical casing for response headers. - string.find(pattern, 1, true) — plain mode required: Without the
truefourth argument, Lua treats the pattern string as a Lua pattern (like a limited regex). Passtrueto force literal substring matching and avoid unintended pattern interpretation on characters like-and.in bot names. - ngx.var.http_user_agent vs self.req.headers: Both work inside Lapis.
ngx.var.http_user_agentis the raw nginx variable (faster, set earlier). Lapis'sself.req.headers["user-agent"]is more idiomatic for Lapis code. Usengx.exit()only if you need to abort before Lapis's response pipeline runs. - MoonScript @ is self: In MoonScript,
@reqisself.req,@resisself.res, etc. The fat arrow=>bindsselfas the first argument. The class-based@before_filteris equivalent toApp:before_filterin Lua.
Framework comparison — Lua / OpenResty web frameworks
| Framework | Hook / phase | Block call | UA header |
|---|---|---|---|
| Lapis | app:before_filter(fn) | return table { status=403, ... } | self.req.headers["user-agent"] |
| Raw OpenResty | access_by_lua_block | ngx.exit(403) | ngx.var.http_user_agent |
| LÖVE / lua-http (standalone) | request handler function | headers:upsert(":status", "403") | headers:get("user-agent") |
| nginx (config only) | if ($http_user_agent ~* pattern) | return 403; | $http_user_agent |
Lapis sits between raw OpenResty and a full MVC framework — it gives you table-return responses and a routing DSL while retaining direct access to the ngx API. Raw OpenResty's access_by_lua_block fires earlier in the nginx pipeline (before the content phase) and is more efficient for blocking, but has no application context. Lapis before_filter is the right choice when you already have a Lapis application and need access to sessions, models, or Lapis helpers alongside bot detection.
Dependencies
# Install OpenResty (includes LuaJIT)
# macOS
brew install openresty
# Ubuntu/Debian
apt install openresty
# Install Lapis via LuaRocks
luarocks install lapis
# Install MoonScript (optional, for .moon files)
luarocks install moonscript
# Create a new Lapis project
lapis new --lua # Lua project
lapis new # MoonScript project (default)
# Run development server (uses OpenResty internally)
lapis server # starts on port 8080 by default
# Production: use the nginx.conf generated by Lapis
# lapis build # generates nginx.conf from config.lua
# openresty -p . -c nginx.conf