How to Block AI Bots on Haskell WAI + Warp: Complete 2026 Guide
WAI (Web Application Interface) is the universal middleware layer for Haskell web — used by Servant, Scotty, and Yesod. Middleware = Application -> Application. To block: call respond with a 403 response directly — the inner Application is never invoked. Headers use CI ByteString keys — case-insensitive comparison encoded in the type.
Two types to know
The second argument to Application is the respond continuation. Call it with a 403 response to short-circuit. Call the inner Application with the request to pass through.
Protection layers
Step 1 — Bot detection module (src/AiBots.hs)
A top-level list of ByteString patterns.BC.isInfixOf for substring matching. any short-circuits on first match. OverloadedStrings allows string literal syntax for ByteString.
-- src/AiBots.hs — bot detection module
module AiBots (isAiBot, aiBotPatterns) where
import qualified Data.ByteString.Char8 as BC
-- Static list of known AI bot UA substrings (lowercase).
-- ByteString literals via OverloadedStrings.
aiBotPatterns :: [BC.ByteString]
aiBotPatterns =
[ -- OpenAI
"gptbot", "chatgpt-user", "oai-searchbot"
-- Anthropic
, "claudebot", "claude-web"
-- Common Crawl
, "ccbot"
-- Bytedance
, "bytespider"
-- Meta
, "meta-externalagent"
-- Perplexity
, "perplexitybot"
-- Google AI
, "google-extended", "googleother"
-- Cohere
, "cohere-ai"
-- Amazon
, "amazonbot"
-- Diffbot
, "diffbot"
-- AI2
, "ai2bot"
-- DeepSeek
, "deepseekbot"
-- Mistral
, "mistralai-user"
-- xAI
, "xai-bot"
-- You.com
, "youbot"
-- DuckDuckGo AI
, "duckassistbot"
]
-- isAiBot: returns True if ua contains any known AI bot pattern.
-- ua must already be lowercased (BC.map toLower) before calling.
isAiBot :: BC.ByteString -> Bool
isAiBot ua = any (\pattern -> pattern `BC.isInfixOf` ua) aiBotPatternsStep 2 — WAI middleware (src/BotBlocker.hs)
hUserAgent from Network.HTTP.Types is CI.mk "User-Agent" — the CI wrapper makes lookup case-insensitive. mapResponseHeaders prepends to the header list on pass-through without rebuilding the entire response.
-- src/BotBlocker.hs — WAI middleware
module BotBlocker (botBlockerMiddleware) where
import qualified Data.ByteString.Char8 as BC
import qualified Data.CaseInsensitive as CI
import Network.HTTP.Types (status403, hUserAgent)
import Network.Wai (Middleware, requestHeaders,
responseLBS, mapResponseHeaders)
import AiBots (isAiBot)
-- WAI Middleware type: Application -> Application
-- Application type: Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
--
-- botBlockerMiddleware wraps any WAI Application.
-- Compose: botBlockerMiddleware myApp
botBlockerMiddleware :: Middleware
botBlockerMiddleware innerApp req respond = do
let headers = requestHeaders req
-- lookup :: CI ByteString -> [(CI ByteString, ByteString)] -> Maybe ByteString
-- hUserAgent = CI.mk "User-Agent" — the CI wrapper handles case-insensitive matching.
-- HTTP spec: header names are case-insensitive.
mUA = lookup hUserAgent headers
-- maybe "" id :: Maybe ByteString -> ByteString (safe default "")
-- BC.map toLower for case-insensitive value comparison.
lowerUA = BC.map toLower (maybe "" id mUA)
if isAiBot lowerUA
then
-- Short-circuit: call respond directly with 403.
-- innerApp is never called — no downstream processing.
respond $ responseLBS
status403
[ ("Content-Type", "text/plain; charset=utf-8")
, ("X-Robots-Tag", "noai, noimageai")
]
"Forbidden"
else do
-- Pass through: call innerApp, then add X-Robots-Tag to the response.
-- mapResponseHeaders prepends to the response header list.
innerApp req $ \response ->
respond (mapResponseHeaders (("X-Robots-Tag", "noai, noimageai") :) response)Step 3 — robots.txt middleware (src/RobotsMiddleware.hs)
A guard on rawPathInfo intercepts the path before botBlockerMiddleware runs. All crawlers — including blocked AI bots — can always read robots.txt.
-- src/RobotsMiddleware.hs — serve robots.txt before bot-blocker runs
module RobotsMiddleware (robotsTxtMiddleware) where
import qualified Data.ByteString.Lazy.Char8 as BLC
import Network.HTTP.Types (status200)
import Network.Wai (Middleware, rawPathInfo,
responseLBS)
robotsTxt :: BLC.ByteString
robotsTxt = BLC.pack $ unlines
[ "User-agent: *"
, "Allow: /"
, ""
, "User-agent: GPTBot"
, "Disallow: /"
, ""
, "User-agent: ClaudeBot"
, "Disallow: /"
, ""
, "User-agent: CCBot"
, "Disallow: /"
, ""
, "User-agent: Bytespider"
, "Disallow: /"
, ""
, "User-agent: Google-Extended"
, "Disallow: /"
, ""
, "User-agent: PerplexityBot"
, "Disallow: /"
, ""
, "User-agent: Meta-ExternalAgent"
, "Disallow: /"
]
-- robotsTxtMiddleware: intercepts /robots.txt requests before any other
-- middleware runs. All crawlers — including AI bots — can read it.
robotsTxtMiddleware :: Middleware
robotsTxtMiddleware innerApp req respond
| rawPathInfo req == "/robots.txt" =
respond $ responseLBS
status200
[("Content-Type", "text/plain; charset=utf-8")]
robotsTxt
| otherwise = innerApp req respondStep 4 — Compose and run (app/Main.hs)
Middleware composes with (.) — standard Haskell function composition. Execution order follows function application: leftmost middleware runs first on the way in.run port app starts the Warp HTTP server.
-- app/Main.hs — compose middleware and start Warp
module Main where
import Network.Wai.Handler.Warp (run)
import BotBlocker (botBlockerMiddleware)
import RobotsMiddleware (robotsTxtMiddleware)
import App (application) -- your WAI Application
main :: IO ()
main = do
let port = 8080
-- Compose middleware with function application (right-to-left).
-- Execution order is left-to-right: robotsTxtMiddleware fires first,
-- then botBlockerMiddleware, then the inner Application.
let app = robotsTxtMiddleware . botBlockerMiddleware $ application
putStrLn $ "Listening on port " <> show port
run port app
-- app/App.hs — minimal WAI Application example
-- module App (application) where
--
-- import Network.HTTP.Types (status200)
-- import Network.Wai (Application, responseLBS)
-- import qualified Data.ByteString.Lazy.Char8 as BLC
--
-- application :: Application
-- application _req respond =
-- respond $ responseLBS status200
-- [("Content-Type", "text/html; charset=utf-8")]
-- "<html><head><meta name=\"robots\" content=\"noai, noimageai\"></head><body>Welcome</body></html>"Step 5 — Servant, Scotty, and Yesod integration
Every Haskell web framework on WAI exposes an Application value. The middleware wraps it unchanged — zero framework-specific modifications needed.
-- Integrating with Servant, Scotty, and Yesod
-- The middleware is framework-agnostic — all produce a WAI Application.
-- ---- Servant ----
-- import Servant
-- import Network.Wai.Handler.Warp (run)
-- import BotBlocker (botBlockerMiddleware)
-- import RobotsMiddleware (robotsTxtMiddleware)
--
-- type API = "hello" :> Get '[PlainText] Text
--
-- server :: Server API
-- server = return "Hello"
--
-- main :: IO ()
-- main = do
-- let app = robotsTxtMiddleware
-- . botBlockerMiddleware
-- $ serve (Proxy :: Proxy API) server
-- run 8080 app
-- ---- Scotty ----
-- import Web.Scotty (scottyApp, get, text)
-- import Network.Wai.Handler.Warp (run)
-- import BotBlocker (botBlockerMiddleware)
-- import RobotsMiddleware (robotsTxtMiddleware)
--
-- main :: IO ()
-- main = do
-- app <- scottyApp $ do
-- get "/" $ text "Hello"
-- run 8080 (robotsTxtMiddleware . botBlockerMiddleware $ app)
-- ---- Yesod ----
-- import Yesod
-- import Network.Wai.Handler.Warp (run)
-- import BotBlocker (botBlockerMiddleware)
-- import RobotsMiddleware (robotsTxtMiddleware)
--
-- data MyApp = MyApp
-- mkYesod "MyApp" [parseRoutes| / HomeR GET |]
-- instance Yesod MyApp
-- getHomeR :: Handler Html
-- getHomeR = defaultLayout [whamlet|<h1>Hello|]
--
-- main :: IO ()
-- main = do
-- app <- toWaiApp MyApp
-- run 8080 (robotsTxtMiddleware . botBlockerMiddleware $ app).cabal dependencies
-- my-app.cabal — dependencies
cabal-version: 3.0
name: my-app
version: 0.1.0.0
executable my-app
main-is: Main.hs
hs-source-dirs: app, src
build-depends:
base >= 4.14 && < 5
, wai >= 3.2 && < 3.3
, warp >= 3.3 && < 3.4
, http-types >= 0.12 && < 0.13
, bytestring >= 0.11 && < 0.13
, case-insensitive >= 1.2 && < 1.3
default-language: Haskell2010
default-extensions:
OverloadedStringsWAI vs Servant vs Scotty vs Yesod
| Feature | WAI / Warp | Servant | Scotty | Yesod |
|---|---|---|---|---|
| Middleware type | type Middleware = Application -> Application — pure function composition | Same WAI Middleware — Servant produces an Application, wrap it | Same WAI Middleware — scottyApp returns IO Application, wrap after extraction | Same WAI Middleware — toWaiApp returns IO Application, wrap after extraction |
| Short-circuit | Call respond with responseLBS status403 — innerApp never called | Identical — middleware wraps the Servant Application | Identical — middleware wraps the Scotty Application | Identical — or use Yesod's built-in isAuthorized for auth-level checks |
| UA header access | lookup hUserAgent (requestHeaders req) :: Maybe ByteString — CI key, Maybe safe | Same WAI requestHeaders in middleware; Servant Header combinator in handlers | Same WAI in middleware; header "User-Agent" :: ActionM (Maybe Text) in handlers | Same WAI in middleware; lookupHeader "User-Agent" :: HandlerFor site (Maybe ByteString) |
| CI ByteString | CI.mk "user-agent" — case-insensitive ByteString for header name lookup | Same — WAI headers use CI ByteString keys universally | Same at WAI layer; Scotty handler API uses Text | Same at WAI layer; Yesod handler API uses Text |
| robots.txt | rawPathInfo req == "/robots.txt" check in robotsTxtMiddleware before bot-blocker | Same WAI middleware approach, or add Raw endpoint in Servant API | Same WAI middleware approach, or get "/robots.txt" handler in ScottyM | Same WAI middleware approach, or StaticR via yesod-static |
| Composition | f . g $ app — standard Haskell function composition, right-to-left wrapping | Same — serve api server gives Application, wrap freely | Same — scottyApp returns IO Application, compose after | Same — toWaiApp returns IO Application, compose after |
Summary
Middleware = Application -> Application— callrespondwith a 403 to short-circuit; callinnerApp req respond'to pass through.CI ByteStringkeys — WAI encodes case-insensitive header names in the type.lookup hUserAgent headersmatchesUser-Agent,user-agent, and all capitalisations.mapResponseHeaders— prepends to the response header list without rebuilding the response body. Use it to add X-Robots-Tag on pass-through.- Framework-agnostic — Servant, Scotty, Yesod all expose a WAI
Application. Wrap once, deploy anywhere. - Middleware composition with
(.)—robotsTxtMiddleware . botBlockerMiddleware $ app— leftmost fires first. Standard Haskell function composition.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.