How to Block AI Bots on OCaml Dream: Complete 2026 Guide
Dream is OCaml's modern full-stack web framework. Middleware follows the same pattern as every functional HTTP library: handler -> handler. To block: return Dream.respond ~status:`Forbidden without calling the inner handler. Dream.header returns string option — the type system enforces the absent-header case.
Dream.header returns string option — always handle None
Dream.header request "user-agent" : string option
Use Option.value ~default:"" for a safe empty-string fallback, or pattern match explicitly. Dream normalises header names to lowercase — always pass "user-agent", never "User-Agent".
Protection layers
Step 1 — Bot detection (lib/ai_bots.ml)
The Re library compiles an alternation of literal patterns once at module load. Re.str treats the string as a literal — no regex interpretation, no escaping needed for hyphens. Re.execp tests for a match anywhere in the string.
(* lib/ai_bots.ml — bot detection module *)
(* Known AI bot UA substrings — lowercase literals.
Re library used for efficient substring matching.
Add to opam: (re (>= 1.10.0)) *)
let patterns = [
(* OpenAI *)
"gptbot"; "chatgpt-user"; "oai-searchbot";
(* Anthropic *)
"claudebot"; "claude-web";
(* Common Crawl *)
"ccbot";
(* Bytedance *)
"bytespider";
(* Meta *)
"meta-externalagent";
(* Perplexity *)
"perplexitybot";
(* Google AI *)
"google-extended"; "googleother";
(* Cohere *)
"cohere-ai";
(* Amazon *)
"amazonbot";
(* Diffbot *)
"diffbot";
(* AI2 *)
"ai2bot";
(* DeepSeek *)
"deepseekbot";
(* Mistral *)
"mistralai-user";
(* xAI *)
"xai-bot";
(* You.com *)
"youbot";
(* DuckDuckGo AI *)
"duckassistbot";
]
(* Re.str creates a literal pattern — no regex interpretation.
Re.alt builds an alternation; Re.compile compiles once at module load.
Re.execp tests if the pattern matches anywhere in the string. *)
let bot_re =
patterns
|> List.map Re.str
|> Re.alt
|> Re.compile
(* is_ai_bot: returns true if ua matches any known AI bot pattern.
ua should already be lowercased before calling. *)
let is_ai_bot ua =
String.length ua > 0 && Re.execp bot_re uaStep 2 — Middleware (lib/bot_blocker.ml)
let* is Lwt's bind syntax (monadic sequencing). The begin...end block groups the pass-through branch. Dream.add_header mutates the response in place before returning it.
(* lib/bot_blocker.ml — Dream middleware *)
(* Dream.middleware type: Dream.handler -> Dream.handler
Dream.handler type: Dream.request -> Dream.response Lwt.t
(Dream uses Lwt for async in OCaml 5.x; let* is Lwt bind syntax) *)
let bot_blocker inner_handler request =
(* Dream.header normalises header names to lowercase internally.
Always use "user-agent" not "User-Agent".
Returns string option — Option.value provides safe default "". *)
let ua =
Dream.header request "user-agent"
|> Option.value ~default:""
|> String.lowercase_ascii
in
if Ai_bots.is_ai_bot ua then
(* Short-circuit: respond with 403 directly.
inner_handler is never called — no downstream processing. *)
Dream.respond
~status:`Forbidden
~headers:[
("Content-Type", "text/plain; charset=utf-8");
("X-Robots-Tag", "noai, noimageai");
]
"Forbidden"
else begin
(* Pass through: call inner_handler, then add X-Robots-Tag. *)
let* response = inner_handler request in
Dream.add_header response "X-Robots-Tag" "noai, noimageai";
Lwt.return response
endStep 3 — Router with global protection (bin/main.ml)
Dream.scope takes a path prefix, a list of middlewares, and a list of routes. Routes outside the scope (robots.txt, /health) bypass bot-blocking entirely. Dream.from_filesystem serves a single file from a directory.
(* bin/main.ml — router with scoped bot blocking *)
let () =
Dream.run
@@ Dream.logger
@@ Dream.router [
(* robots.txt — served OUTSIDE the bot_blocker scope.
All crawlers including AI bots must be able to read it.
Dream.from_filesystem serves a file from disk. *)
Dream.get "/robots.txt" (fun _ ->
Dream.from_filesystem "public" "robots.txt" ()
);
(* /health — also outside bot-blocker scope *)
Dream.get "/health" (fun _ ->
Dream.respond "ok"
);
(* Dream.scope: apply bot_blocker middleware to all routes under "/" *)
(* Syntax: Dream.scope prefix [middlewares] [routes] *)
Dream.scope "/" [Bot_blocker.bot_blocker] [
Dream.get "/" (fun _ ->
Dream.respond
~headers:[("Content-Type", "text/html; charset=utf-8")]
{|<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noai, noimageai">
<title>My Site</title>
</head>
<body><h1>Welcome</h1></body>
</html>|}
);
Dream.get "/api/data" (fun _ ->
Dream.respond
~headers:[("Content-Type", "application/json")]
{|{"data":"protected"}|}
);
];
]Step 4 — Scoped protection and middleware stacking
Apply bot_blocker only to /api/* routes. Public routes and robots.txt remain at the top level. Multiple middlewares in the Dream.scope list compose left-to-right.
(* Scoped protection — only /api/* routes get the bot check *)
let () =
Dream.run
@@ Dream.logger
@@ Dream.router [
(* Public — no bot check *)
Dream.get "/robots.txt" (fun _ ->
Dream.from_filesystem "public" "robots.txt" ()
);
Dream.get "/" (fun _ -> Dream.respond "Welcome");
Dream.get "/health" (fun _ -> Dream.respond "ok");
(* Protected — bot_blocker applied only to /api/* *)
Dream.scope "/api" [Bot_blocker.bot_blocker] [
Dream.get "/data" api_data_handler;
Dream.post "/submit" api_submit_handler;
];
]
(* Multiple middleware layers — apply left-to-right *)
(* Dream.scope "/api" [rate_limiter; bot_blocker] [...] *)
(* rate_limiter runs first, bot_blocker second *)dream.opam / dune dependencies
# dream.opam — dependencies (or dune-project with opam stanza)
(package
(name my-app)
(depends
(ocaml (>= 5.0.0))
(dream (>= 1.0.0~alpha8))
(re (>= 1.10.0))
(lwt (>= 5.6.0))))
# dune-project
# (executable
# (name main)
# (libraries dream re lwt))
# Install and run:
# opam install dream re lwt
# dune exec ./bin/main.exeDream vs Opium vs CoHTTP vs MirageOS
| Feature | OCaml Dream | Opium | CoHTTP | MirageOS |
|---|---|---|---|---|
| Middleware type | handler -> handler where handler = request -> response Lwt.t — pure function composition | Opium.Middleware.t — Rock.Handler.t -> Rock.Handler.t (same pattern, Rock abstraction) | No built-in middleware — manual handler composition via Server.respond | Conduit + CoHTTP in MirageOS unikernel — dispatcher pattern, no middleware chain |
| Short-circuit | Dream.respond ~status:`Forbidden without calling inner_handler | Opium.Response.of_plain_text ~status:`Forbidden without calling next | Cohttp_lwt_unix.Server.respond_string ~status:`Forbidden in handler | Dispatch.respond ~status:`Forbidden in unikernel dispatch function |
| UA header access | Dream.header request "user-agent" :: string option — lowercase, Option.value for default | Opium.Request.header "user-agent" request :: string option — same pattern | Cohttp.Header.get (Request.headers req) "user-agent" :: string option | Cohttp.Header.get (Request.headers req) "user-agent" in unikernel handler |
| Substring matching | Re.execp (Re.compile (Re.str pattern)) ua — Re library, literal match | Same Re library approach — identical OCaml ecosystem | Same Re library, or String.split_on_char + List.exists | Same OCaml stdlib / Re library |
| robots.txt | Dream.get "/robots.txt" outside scoped middleware — Dream.from_filesystem for file | Separate route without bot-blocker middleware; Opium.Middleware.static for dir serving | Manual path check in handler: if Uri.path uri = "/robots.txt" then ... | Explicit dispatch branch in unikernel for /robots.txt path |
| Async model | Lwt (monadic async) — let* for bind, Lwt.return to lift; Effect-based in Dream 2.x | Lwt — same as Dream (both built on the OCaml async ecosystem) | Lwt — Cohttp is the Lwt HTTP library | Lwt in MirageOS — same model, compiled to unikernel binary |
Summary
- Return without calling inner_handler —
Dream.respond ~status:`Forbiddendirectly short-circuits. The inner handler and all downstream processing never run. string optionheader access —Dream.headerreturns an option. UseOption.value ~default:""— the type system enforces you handle the absent case.Re.strfor literal patterns —Re.strtreats the input as a literal, not a regex. Hyphens in bot names need no escaping. Compile once at module load withRe.compile.Dream.scopefor scoping — apply middleware to a path prefix only. robots.txt and public routes live outside the scope.- Dream normalises header keys — always query
"user-agent"(lowercase). Dream lowercases all header names per HTTP spec.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.