Skip to content
Guides/OCaml Dream

How to Block AI Bots on OCaml Dream: Complete 2026 Guide

Dream is OCaml's modern full-stack web framework. Middleware follows the same pattern as every functional HTTP library: handler -> handler. To block: return Dream.respond ~status:`Forbidden without calling the inner handler. Dream.header returns string option — the type system enforces the absent-header case.

Dream.header returns string option — always handle None

Dream.header request "user-agent" : string option
Use Option.value ~default:"" for a safe empty-string fallback, or pattern match explicitly. Dream normalises header names to lowercase — always pass "user-agent", never "User-Agent".

Protection layers

1
robots.txtDream.get "/robots.txt" route placed outside the scoped middleware — Dream.from_filesystem serves the file
2
noai meta tagIn HTML response body string — <meta name="robots" content="noai, noimageai"> in <head>
3
X-Robots-Tag (blocked)~headers:[("X-Robots-Tag", "noai, noimageai")] in Dream.respond for 403 response
4
X-Robots-Tag (legitimate)Dream.add_header response "X-Robots-Tag" "noai, noimageai" after inner_handler returns
5
Hard 403Dream.respond ~status:`Forbidden — inner_handler never called

Step 1 — Bot detection (lib/ai_bots.ml)

The Re library compiles an alternation of literal patterns once at module load. Re.str treats the string as a literal — no regex interpretation, no escaping needed for hyphens. Re.execp tests for a match anywhere in the string.

(* lib/ai_bots.ml — bot detection module *)

(* Known AI bot UA substrings — lowercase literals.
   Re library used for efficient substring matching.
   Add to opam: (re (>= 1.10.0)) *)

let patterns = [
  (* OpenAI *)
  "gptbot"; "chatgpt-user"; "oai-searchbot";
  (* Anthropic *)
  "claudebot"; "claude-web";
  (* Common Crawl *)
  "ccbot";
  (* Bytedance *)
  "bytespider";
  (* Meta *)
  "meta-externalagent";
  (* Perplexity *)
  "perplexitybot";
  (* Google AI *)
  "google-extended"; "googleother";
  (* Cohere *)
  "cohere-ai";
  (* Amazon *)
  "amazonbot";
  (* Diffbot *)
  "diffbot";
  (* AI2 *)
  "ai2bot";
  (* DeepSeek *)
  "deepseekbot";
  (* Mistral *)
  "mistralai-user";
  (* xAI *)
  "xai-bot";
  (* You.com *)
  "youbot";
  (* DuckDuckGo AI *)
  "duckassistbot";
]

(* Re.str creates a literal pattern — no regex interpretation.
   Re.alt builds an alternation; Re.compile compiles once at module load.
   Re.execp tests if the pattern matches anywhere in the string. *)
let bot_re =
  patterns
  |> List.map Re.str
  |> Re.alt
  |> Re.compile

(* is_ai_bot: returns true if ua matches any known AI bot pattern.
   ua should already be lowercased before calling. *)
let is_ai_bot ua =
  String.length ua > 0 && Re.execp bot_re ua

Step 2 — Middleware (lib/bot_blocker.ml)

let* is Lwt's bind syntax (monadic sequencing). The begin...end block groups the pass-through branch. Dream.add_header mutates the response in place before returning it.

(* lib/bot_blocker.ml — Dream middleware *)

(* Dream.middleware type: Dream.handler -> Dream.handler
   Dream.handler type:    Dream.request -> Dream.response Lwt.t
   (Dream uses Lwt for async in OCaml 5.x; let* is Lwt bind syntax) *)

let bot_blocker inner_handler request =
  (* Dream.header normalises header names to lowercase internally.
     Always use "user-agent" not "User-Agent".
     Returns string option — Option.value provides safe default "". *)
  let ua =
    Dream.header request "user-agent"
    |> Option.value ~default:""
    |> String.lowercase_ascii
  in
  if Ai_bots.is_ai_bot ua then
    (* Short-circuit: respond with 403 directly.
       inner_handler is never called — no downstream processing. *)
    Dream.respond
      ~status:`Forbidden
      ~headers:[
        ("Content-Type",  "text/plain; charset=utf-8");
        ("X-Robots-Tag",  "noai, noimageai");
      ]
      "Forbidden"
  else begin
    (* Pass through: call inner_handler, then add X-Robots-Tag. *)
    let* response = inner_handler request in
    Dream.add_header response "X-Robots-Tag" "noai, noimageai";
    Lwt.return response
  end

Step 3 — Router with global protection (bin/main.ml)

Dream.scope takes a path prefix, a list of middlewares, and a list of routes. Routes outside the scope (robots.txt, /health) bypass bot-blocking entirely. Dream.from_filesystem serves a single file from a directory.

(* bin/main.ml — router with scoped bot blocking *)

let () =
  Dream.run
  @@ Dream.logger
  @@ Dream.router [

    (* robots.txt — served OUTSIDE the bot_blocker scope.
       All crawlers including AI bots must be able to read it.
       Dream.from_filesystem serves a file from disk. *)
    Dream.get "/robots.txt" (fun _ ->
      Dream.from_filesystem "public" "robots.txt" ()
    );

    (* /health — also outside bot-blocker scope *)
    Dream.get "/health" (fun _ ->
      Dream.respond "ok"
    );

    (* Dream.scope: apply bot_blocker middleware to all routes under "/" *)
    (* Syntax: Dream.scope prefix [middlewares] [routes] *)
    Dream.scope "/" [Bot_blocker.bot_blocker] [

      Dream.get "/" (fun _ ->
        Dream.respond
          ~headers:[("Content-Type", "text/html; charset=utf-8")]
          {|<!DOCTYPE html>
<html>
<head>
  <meta name="robots" content="noai, noimageai">
  <title>My Site</title>
</head>
<body><h1>Welcome</h1></body>
</html>|}
      );

      Dream.get "/api/data" (fun _ ->
        Dream.respond
          ~headers:[("Content-Type", "application/json")]
          {|{"data":"protected"}|}
      );
    ];
  ]

Step 4 — Scoped protection and middleware stacking

Apply bot_blocker only to /api/* routes. Public routes and robots.txt remain at the top level. Multiple middlewares in the Dream.scope list compose left-to-right.

(* Scoped protection — only /api/* routes get the bot check *)

let () =
  Dream.run
  @@ Dream.logger
  @@ Dream.router [

    (* Public — no bot check *)
    Dream.get "/robots.txt" (fun _ ->
      Dream.from_filesystem "public" "robots.txt" ()
    );
    Dream.get "/"       (fun _ -> Dream.respond "Welcome");
    Dream.get "/health" (fun _ -> Dream.respond "ok");

    (* Protected — bot_blocker applied only to /api/* *)
    Dream.scope "/api" [Bot_blocker.bot_blocker] [
      Dream.get "/data"   api_data_handler;
      Dream.post "/submit" api_submit_handler;
    ];
  ]

(* Multiple middleware layers — apply left-to-right *)
(* Dream.scope "/api" [rate_limiter; bot_blocker] [...] *)
(* rate_limiter runs first, bot_blocker second *)

dream.opam / dune dependencies

# dream.opam — dependencies (or dune-project with opam stanza)

(package
 (name my-app)
 (depends
  (ocaml (>= 5.0.0))
  (dream (>= 1.0.0~alpha8))
  (re    (>= 1.10.0))
  (lwt   (>= 5.6.0))))

# dune-project
# (executable
#  (name main)
#  (libraries dream re lwt))

# Install and run:
# opam install dream re lwt
# dune exec ./bin/main.exe

Dream vs Opium vs CoHTTP vs MirageOS

FeatureOCaml DreamOpiumCoHTTPMirageOS
Middleware typehandler -> handler where handler = request -> response Lwt.t — pure function compositionOpium.Middleware.t — Rock.Handler.t -> Rock.Handler.t (same pattern, Rock abstraction)No built-in middleware — manual handler composition via Server.respondConduit + CoHTTP in MirageOS unikernel — dispatcher pattern, no middleware chain
Short-circuitDream.respond ~status:`Forbidden without calling inner_handlerOpium.Response.of_plain_text ~status:`Forbidden without calling nextCohttp_lwt_unix.Server.respond_string ~status:`Forbidden in handlerDispatch.respond ~status:`Forbidden in unikernel dispatch function
UA header accessDream.header request "user-agent" :: string option — lowercase, Option.value for defaultOpium.Request.header "user-agent" request :: string option — same patternCohttp.Header.get (Request.headers req) "user-agent" :: string optionCohttp.Header.get (Request.headers req) "user-agent" in unikernel handler
Substring matchingRe.execp (Re.compile (Re.str pattern)) ua — Re library, literal matchSame Re library approach — identical OCaml ecosystemSame Re library, or String.split_on_char + List.existsSame OCaml stdlib / Re library
robots.txtDream.get "/robots.txt" outside scoped middleware — Dream.from_filesystem for fileSeparate route without bot-blocker middleware; Opium.Middleware.static for dir servingManual path check in handler: if Uri.path uri = "/robots.txt" then ...Explicit dispatch branch in unikernel for /robots.txt path
Async modelLwt (monadic async) — let* for bind, Lwt.return to lift; Effect-based in Dream 2.xLwt — same as Dream (both built on the OCaml async ecosystem)Lwt — Cohttp is the Lwt HTTP libraryLwt in MirageOS — same model, compiled to unikernel binary

Summary

  • Return without calling inner_handler Dream.respond ~status:`Forbidden directly short-circuits. The inner handler and all downstream processing never run.
  • string option header access Dream.header returns an option. Use Option.value ~default:"" — the type system enforces you handle the absent case.
  • Re.str for literal patterns Re.str treats the input as a literal, not a regex. Hyphens in bot names need no escaping. Compile once at module load with Re.compile.
  • Dream.scope for scoping — apply middleware to a path prefix only. robots.txt and public routes live outside the scope.
  • Dream normalises header keys — always query "user-agent" (lowercase). Dream lowercases all header names per HTTP spec.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.