How to Block AI Bots in Julia Genie.jl
Genie.jl is Julia's full-featured web framework, widely used for serving machine learning models, scientific APIs, and data dashboards. Because Genie is built on top of HTTP.jl, bot blocking hooks into HTTP.jl's middleware stack — a handler-wrapping pattern that intercepts every request before Genie's router, giving you the earliest possible rejection point.
1. Bot pattern module
A dedicated AiBots module keeps patterns isolated and testable. lowercase() normalises the User-Agent; occursin(pattern, ua) does plain-text substring matching. any() short-circuits on the first match.
# src/ai_bots.jl
module AiBots
const PATTERNS = [
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
]
"""
is_ai_bot(user_agent::AbstractString) -> Bool
Returns true if the User-Agent string matches any known AI crawler pattern.
Pattern matching is case-insensitive substring search — no regex needed.
"""
function is_ai_bot(user_agent::AbstractString)
ua = lowercase(user_agent)
any(pattern -> occursin(pattern, ua), PATTERNS)
end
end # module2. HTTP.jl middleware function
HTTP.jl middleware is a higher-order function: handler -> req -> response. Return HTTP.Response(403, headers; body=...) to short-circuit. Call handler(req) to forward. Use HTTP.setheader(resp, ...) to add headers to the passing response without reconstructing it.
# src/middleware.jl
using HTTP
include("ai_bots.jl")
"""
HTTP.jl middleware that blocks AI crawlers and injects X-Robots-Tag.
Genie is built on HTTP.jl — pass this function to the middleware stack
via up(middleware=[bot_blocker]) or Server.startup(middleware=[bot_blocker]).
"""
function bot_blocker(handler)
function(req::HTTP.Request)
# Allow robots.txt regardless of User-Agent
if req.target != "/robots.txt"
ua = HTTP.header(req, "User-Agent", "")
if AiBots.is_ai_bot(ua)
return HTTP.Response(
403,
[
"X-Robots-Tag" => "noai, noimageai",
"Content-Type" => "text/plain; charset=utf-8",
];
body = "Forbidden",
)
end
end
# Pass to inner handler
resp = handler(req)
# Inject X-Robots-Tag on every passing response
HTTP.setheader(resp, "X-Robots-Tag" => "noai, noimageai")
return resp
end
end3. Register with Genie.Server.startup()
Pass the middleware function in the middleware vector. Multiple middleware functions run left-to-right — bot_blocker should come first so it fires before any authentication or logging middleware.
# app.jl — main Genie application
using Genie
include("src/middleware.jl")
include("src/routes.jl") # your route definitions
# Start the server with bot-blocking middleware
Genie.Server.startup(
port = 8000,
host = "0.0.0.0",
middleware = [bot_blocker],
async = false, # set true for non-blocking startup in scripts
)4. Routes (unchanged alongside middleware)
Genie's route macros (route, @get, @post) work without modification. Middleware runs transparently before any route handler is invoked.
# src/routes.jl — Genie routes (defined separately from middleware)
using Genie.Router, Genie.Renderer.Html, Genie.Renderer.Json
# Standard Genie route macros work unchanged alongside middleware
route("/") do
html("<h1>Hello</h1>")
end
route("/api/data") do
json(Dict("status" => "ok"))
end
# robots.txt — can also be a static file in public/robots.txt
# Genie serves public/ automatically; middleware path check handles both.
route("/robots.txt") do
Genie.Renderer.respond(
readfile("public/robots.txt"),
200,
Dict("Content-Type" => "text/plain"),
)
end5. public/robots.txt
Genie serves the public/ directory automatically. The req.target != "/robots.txt" guard in the middleware ensures AI crawlers can still fetch robots.txt to learn they are disallowed — which is the correct and standards-compliant behaviour.
# public/robots.txt
# Genie automatically serves files from public/ before the router.
# The /robots.txt path check in bot_blocker covers both the static
# file and the route handler variant.
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /6. occursin() vs regex
# Why occursin() instead of regex?
#
# occursin(pattern, haystack) is a plain substring search — O(n·m) but
# with zero allocation overhead for short patterns. For a list of 16
# fixed patterns checked once per request, this is faster and simpler
# than compiling a regex or using a Trie.
#
# Julia's any() short-circuits on the first true match, so blocked bots
# rarely cause more than 1-2 pattern comparisons.
#
# Equivalent regex approach (if needed for complex patterns):
using Base.RegexMatch
const BOT_REGEX = Regex(join(AiBots.PATTERNS, "|"), "i")
is_ai_bot_regex(ua::String) = occursin(BOT_REGEX, ua)7. Project.toml
# Project.toml
[deps]
Genie = "c43c736e-a2d1-11e8-161f-af95117fbd1e"
HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
# Install dependencies:
# julia --project=. -e 'using Pkg; Pkg.instantiate()'
#
# Run:
# julia --project=. app.jlKey points
- Middleware layer: HTTP.jl middleware fires before Genie's router — it catches every request including static file serving, Genie's built-in
/_devroutes, and WebSocket upgrades. - Header access:
HTTP.header(req, "User-Agent", "")— third argument is the default value if the header is absent. Headers are stored case-insensitively. - Short-circuit: Return
HTTP.Response(403, headers; body="...")directly — do not callhandler(req). Julia's keyword argument syntax requiresbody=rather than a positional argument. - Pass-through headers:
HTTP.setheader(resp, key => value)mutates the response in-place. More efficient than constructing a newHTTP.Response. - occursin(): Returns
Bool. Combine withany(f, collection)for short-circuit evaluation across the pattern list — stops on first match. - JIT warm-up: Julia compiles on first call. In production, call
bot_blockeronce during startup (with a dummy request) to precompile and avoid latency on the first real request.
Framework comparison — scientific / ML ecosystem
| Framework | Middleware pattern | Short-circuit | Pattern match |
|---|---|---|---|
| Julia Genie.jl | handler -> req -> resp | HTTP.Response(403, ...) | occursin() |
| Python FastAPI | BaseHTTPMiddleware | Response(status_code=403) | in ua_lower |
| Python Flask | @app.before_request | abort(403) | in ua_lower |
| R Plumber | pr_filter() | res$status <- 403; return() | grepl() |
Genie.jl follows the same handler-wrapping pattern as Python's ASGI middleware (Starlette/FastAPI) — a function that receives the next handler and returns a new handler. The key Julia-specific detail is the JIT warm-up consideration: unlike Python or Go, Julia incurs a compilation cost on first execution, making a startup warm-up call worth adding in latency-sensitive APIs.