How to Block AI Bots in Common Lisp Hunchentoot
Hunchentoot is the most widely used Common Lisp web server — a threaded HTTP server with a CLOS-based acceptor architecture, easy handler dispatch, and special variables for request/reply state. Two approaches exist for bot blocking. The simpler one: *before-request-hook* — set it to a zero-argument function that runs before every request. The more idiomatic Lisp approach: define a custom acceptor class and specialise acceptor-dispatch-request via CLOS. Both use the same short-circuit: set (hunchentoot:return-code*) to +http-forbidden+, then call (hunchentoot:abort-request-handler) — which signals a condition that Hunchentoot catches to terminate processing. Header access uses (hunchentoot:header-in* :user-agent); the * suffix indicates it operates on the current *request* special variable.
1. Bot detection
A Common Lisp function with no library dependencies. SEARCH performs literal substring search — returns the start index or NIL, used as a truthy test. SOME short-circuits on the first match. string-downcase applied once before iteration.
;;; bot-utils.lisp — AI bot detection, no dependencies
(in-package :my-app)
;;; All lowercase — matched against (string-downcase ua)
(defparameter *ai-bot-patterns*
'("gptbot"
"chatgpt-user"
"claudebot"
"anthropic-ai"
"ccbot"
"google-extended"
"cohere-ai"
"meta-externalagent"
"bytespider"
"omgili"
"diffbot"
"imagesiftbot"
"magpie-crawler"
"amazonbot"
"dataprovider"
"netcraft")
"Lowercase substrings to match against the User-Agent header.")
(defun ai-bot-p (ua)
"Return T if UA string matches a known AI crawler pattern."
(when (and ua (not (string= ua "")))
(let ((lower (string-downcase ua)))
;; SEARCH returns the start index or NIL — literal substring, no regex
(some (lambda (pattern) (search pattern lower)) *ai-bot-patterns*))))2. *before-request-hook* — global filter
Set hunchentoot:*before-request-hook* to a function of zero arguments. It runs in the request context — *request* and *reply* are bound. Set (return-code*) to +http-forbidden+ before calling abort-request-handler — the status code must be set before the condition is signalled.
;;; server.lisp — Hunchentoot server with *before-request-hook*
(in-package :my-app)
;;; ── *before-request-hook* approach ──────────────────────────────────────────
;;; Set to a zero-argument function called before every request dispatch.
;;; Runs inside the request context — *request* and *reply* are bound.
;;; Return value is ignored; use abort-request-handler to short-circuit.
(defun check-ai-bot ()
"Block AI crawlers before request dispatch."
;; Path guard: let robots.txt through.
;; Hunchentoot calls this before dispatch — static files not yet considered.
(let ((path (hunchentoot:script-name* hunchentoot:*request*)))
(when (string= path "/robots.txt")
(return-from check-ai-bot))) ; pass through
;; header-in* reads the incoming request header.
;; :user-agent keyword — Hunchentoot normalises to lowercase internally.
;; Returns NIL when the header is absent.
(let ((ua (or (hunchentoot:header-in* :user-agent) "")))
(when (ai-bot-p ua)
;; Set X-Robots-Tag on the blocked response.
;; setf header-out writes to the outgoing reply (*reply* special var).
(setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai")
;; Set the HTTP status code before aborting.
;; +http-forbidden+ = 403
(setf (hunchentoot:return-code*) hunchentoot:+http-forbidden+)
;; abort-request-handler signals a condition that Hunchentoot catches
;; to terminate request processing immediately.
(hunchentoot:abort-request-handler "Forbidden"))))
;;; Register the hook globally — fires for every request on every acceptor.
(setf hunchentoot:*before-request-hook* #'check-ai-bot)3. *after-request-hook* — X-Robots-Tag on passing responses
*after-request-hook* fires after the handler runs for requests that were not aborted. Together with the before-hook, every response gets X-Robots-Tag with no duplication.
;;; Add *after-request-hook* for X-Robots-Tag on passing responses.
;;; Fires after the handler runs — for requests that were not aborted.
(defun add-robots-header ()
"Add X-Robots-Tag to all passing responses."
(setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai"))
(setf hunchentoot:*after-request-hook* #'add-robots-header)
;;; Note: *before-request-hook* and *after-request-hook* are complementary:
;;; - Blocked requests: X-Robots-Tag set in check-ai-bot (before abort)
;;; - Passing requests: X-Robots-Tag set in add-robots-header (after handler)
;;; This covers all responses without duplication.4. Custom acceptor — CLOS method specialisation
Define a subclass of easy-acceptor and specialise acceptor-dispatch-request on it. This is more idiomatic Common Lisp — per-instance control, clean CLOS dispatch, no global variable mutation. call-next-method invokes the parent class dispatch for passing requests.
;;; Alternative: CLOS acceptor override — per-acceptor bot blocking.
;;; More idiomatic Common Lisp; gives per-instance control.
;;; Use this when you have multiple acceptors and want to scope the filter.
(defclass bot-blocking-acceptor (hunchentoot:easy-acceptor)
()
(:documentation "Acceptor that blocks AI crawlers before dispatch."))
;;; Specialise acceptor-dispatch-request on our custom class.
;;; Called for every request — fires before handlers.
(defmethod hunchentoot:acceptor-dispatch-request
((acceptor bot-blocking-acceptor) request)
(let ((path (hunchentoot:script-name* request))
(ua (or (hunchentoot:header-in* :user-agent) "")))
;; Path guard
(unless (string= path "/robots.txt")
(when (ai-bot-p ua)
(setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai")
(setf (hunchentoot:return-code*)
hunchentoot:+http-forbidden+)
(hunchentoot:abort-request-handler "Forbidden"))))
;; Pass through: inject X-Robots-Tag, then call the next method (dispatch).
(setf (hunchentoot:header-out "X-Robots-Tag") "noai, noimageai")
(call-next-method))5. Route handlers — define-easy-handler
define-easy-handler registers a handler at a URI path. The before-hook fires before any handler is called — blocked requests never reach these functions.
;;; handlers.lisp — Easy handlers (routes)
(in-package :my-app)
;;; define-easy-handler registers a handler at a URI path.
;;; The before-hook fires before this handler is called.
(hunchentoot:define-easy-handler (index :uri "/") ()
(setf (hunchentoot:content-type*) "application/json")
"{"message": "Hello"}")
(hunchentoot:define-easy-handler (api-data :uri "/api/data") ()
(setf (hunchentoot:content-type*) "application/json")
"{"data": "value"}")
(hunchentoot:define-easy-handler (robots-txt :uri "/robots.txt") ()
(setf (hunchentoot:content-type*) "text/plain")
"User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /")6. Start the server
;;; main.lisp — start the Hunchentoot server
(in-package :my-app)
;;; Register hooks
(setf hunchentoot:*before-request-hook* #'check-ai-bot)
(setf hunchentoot:*after-request-hook* #'add-robots-header)
;;; Create and start an easy-acceptor on port 8080.
;;; easy-acceptor uses easy-handler dispatch (define-easy-handler).
(defvar *server*
(hunchentoot:start
(make-instance 'hunchentoot:easy-acceptor :port 8080)))
;;; To use the custom CLOS acceptor instead:
;; (defvar *server*
;; (hunchentoot:start
;; (make-instance 'bot-blocking-acceptor :port 8080)))
;;; To stop: (hunchentoot:stop *server*)7. ASDF system definition
;;; my-app.asd — ASDF system definition
(asdf:defsystem #:my-app
:description "Hunchentoot web application with AI bot blocking"
:author "Example"
:license "MIT"
:version "0.1.0"
:depends-on (#:hunchentoot)
:components ((:file "package")
(:file "bot-utils" :depends-on ("package"))
(:file "server" :depends-on ("bot-utils"))
(:file "handlers" :depends-on ("server"))
(:file "main" :depends-on ("handlers"))))
;;; package.lisp
(defpackage #:my-app
(:use #:common-lisp)
(:export #:start-server #:stop-server))Key points
- Set return-code* before abort-request-handler: The status code must be set on the reply object before signalling the abort condition. Set
(return-code*)to+http-forbidden+(403) first, then callabort-request-handler. The order matters — the condition signals before anything else writes the status. - header-in* vs header-out:
header-in*reads incoming request headers;header-out(viasetf) writes outgoing response headers. The*suffix onheader-in*andreturn-code*indicates these functions operate on the current*request*/*reply*special variable — a Hunchentoot naming convention. - header-in* returns NIL for absent headers: Unlike many frameworks that return an empty string,
header-in*returnsNILwhen the header is not present. Use(or (header-in* :user-agent) "")for a safe string default before string operations. - SEARCH for literal substring matching: Common Lisp's
SEARCHfunction finds a subsequence within a sequence — for strings, this is literal substring matching with no regex engine. It returns the start index orNIL. Used as a boolean test insideSOMEfor short-circuiting iteration. - *before-request-hook* vs acceptor-dispatch-request: The hook variable is simpler to set up; the CLOS method is more idiomatic and gives per-acceptor control. For a single-server application, use the hook. For applications with multiple acceptors (HTTP + HTTPS on different ports), the CLOS approach allows different filtering rules per acceptor.
- call-next-method in CLOS override: After the bot check passes in the custom acceptor, call
call-next-methodto invoke the parent class dispatch. This is idiomatic CLOS — always callcall-next-methodfor pass-through behaviour unless you are fully replacing the method.
Framework comparison — Lisp and functional web servers
| Framework | Hook / middleware | Block call | UA header |
|---|---|---|---|
| Hunchentoot (CL) | *before-request-hook* | setf return-code* +http-forbidden+; abort-request-handler | (header-in* :user-agent) |
| Clojure Ring | middleware function | return {:status 403 :body "..."} | (get-in req [:headers "user-agent"]) |
| Erlang Cowboy | init/2 callback | {stop, Reply, State} | cowboy_req:header(‹<<"user-agent">>, Req) |
| Gleam Wisp | middleware function | return wisp.response_403() | request.get_header(req, "user-agent") |
Hunchentoot's condition-signalling abort is unique among all frameworks in this series — it uses Common Lisp's condition system (not exceptions in the traditional sense) to terminate request processing. The special variable naming convention (*before-request-hook*, header-in*, return-code*) is idiomatic Common Lisp — *earmuffs* for dynamic variables, * suffix for "current request" accessors.
Dependencies
# Install Quicklisp (CL package manager) if not already installed:
# curl -O https://beta.quicklisp.org/quicklisp.lisp
# sbcl --load quicklisp.lisp --eval '(quicklisp-quickstart:install)'
# Load Hunchentoot in SBCL (Steel Bank Common Lisp):
# (ql:quickload :hunchentoot)
# Or add to my-app.asd and load with:
# (ql:quickload :my-app)
# Run the application:
# sbcl --load my-app.asd \
# --eval "(ql:quickload :my-app)" \
# --eval "(my-app:start-server)"
# Other CL implementations: CCL, ECL, ABCL (JVM-based), CLISP
# Hunchentoot works on all major implementations.