Skip to content
Guides/Haskell WAI

How to Block AI Bots on Haskell WAI + Warp: Complete 2026 Guide

WAI (Web Application Interface) is the universal middleware layer for Haskell web — used by Servant, Scotty, and Yesod. Middleware = Application -> Application. To block: call respond with a 403 response directly — the inner Application is never invoked. Headers use CI ByteString keys — case-insensitive comparison encoded in the type.

Two types to know

type Application = Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
type Middleware = Application -> Application

The second argument to Application is the respond continuation. Call it with a 403 response to short-circuit. Call the inner Application with the request to pass through.

Protection layers

1
robots.txtrobotsTxtMiddleware checks rawPathInfo == "/robots.txt" — intercepts before botBlockerMiddleware runs
2
noai meta tagIn HTML response body — <meta name="robots" content="noai, noimageai"> in <head>
3
X-Robots-Tag (blocked)("X-Robots-Tag", "noai, noimageai") in responseLBS headers for 403 response
4
X-Robots-Tag (legitimate)mapResponseHeaders (("X-Robots-Tag", "noai, noimageai") :) on inner Application response
5
Hard 403respond $ responseLBS status403 — innerApp never called, no downstream processing

Step 1 — Bot detection module (src/AiBots.hs)

A top-level list of ByteString patterns.BC.isInfixOf for substring matching. any short-circuits on first match. OverloadedStrings allows string literal syntax for ByteString.

-- src/AiBots.hs — bot detection module

module AiBots (isAiBot, aiBotPatterns) where

import qualified Data.ByteString.Char8 as BC

-- Static list of known AI bot UA substrings (lowercase).
-- ByteString literals via OverloadedStrings.
aiBotPatterns :: [BC.ByteString]
aiBotPatterns =
  [ -- OpenAI
    "gptbot", "chatgpt-user", "oai-searchbot"
    -- Anthropic
  , "claudebot", "claude-web"
    -- Common Crawl
  , "ccbot"
    -- Bytedance
  , "bytespider"
    -- Meta
  , "meta-externalagent"
    -- Perplexity
  , "perplexitybot"
    -- Google AI
  , "google-extended", "googleother"
    -- Cohere
  , "cohere-ai"
    -- Amazon
  , "amazonbot"
    -- Diffbot
  , "diffbot"
    -- AI2
  , "ai2bot"
    -- DeepSeek
  , "deepseekbot"
    -- Mistral
  , "mistralai-user"
    -- xAI
  , "xai-bot"
    -- You.com
  , "youbot"
    -- DuckDuckGo AI
  , "duckassistbot"
  ]

-- isAiBot: returns True if ua contains any known AI bot pattern.
-- ua must already be lowercased (BC.map toLower) before calling.
isAiBot :: BC.ByteString -> Bool
isAiBot ua = any (\pattern -> pattern `BC.isInfixOf` ua) aiBotPatterns

Step 2 — WAI middleware (src/BotBlocker.hs)

hUserAgent from Network.HTTP.Types is CI.mk "User-Agent" — the CI wrapper makes lookup case-insensitive. mapResponseHeaders prepends to the header list on pass-through without rebuilding the entire response.

-- src/BotBlocker.hs — WAI middleware

module BotBlocker (botBlockerMiddleware) where

import qualified Data.ByteString.Char8 as BC
import qualified Data.CaseInsensitive  as CI
import           Network.HTTP.Types    (status403, hUserAgent)
import           Network.Wai           (Middleware, requestHeaders,
                                        responseLBS, mapResponseHeaders)
import           AiBots                (isAiBot)

-- WAI Middleware type: Application -> Application
-- Application type:    Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
--
-- botBlockerMiddleware wraps any WAI Application.
-- Compose: botBlockerMiddleware myApp
botBlockerMiddleware :: Middleware
botBlockerMiddleware innerApp req respond = do
  let headers  = requestHeaders req
      -- lookup :: CI ByteString -> [(CI ByteString, ByteString)] -> Maybe ByteString
      -- hUserAgent = CI.mk "User-Agent" — the CI wrapper handles case-insensitive matching.
      -- HTTP spec: header names are case-insensitive.
      mUA      = lookup hUserAgent headers
      -- maybe "" id :: Maybe ByteString -> ByteString (safe default "")
      -- BC.map toLower for case-insensitive value comparison.
      lowerUA  = BC.map toLower (maybe "" id mUA)

  if isAiBot lowerUA
    then
      -- Short-circuit: call respond directly with 403.
      -- innerApp is never called — no downstream processing.
      respond $ responseLBS
        status403
        [ ("Content-Type",  "text/plain; charset=utf-8")
        , ("X-Robots-Tag",  "noai, noimageai")
        ]
        "Forbidden"
    else do
      -- Pass through: call innerApp, then add X-Robots-Tag to the response.
      -- mapResponseHeaders prepends to the response header list.
      innerApp req $ \response ->
        respond (mapResponseHeaders (("X-Robots-Tag", "noai, noimageai") :) response)

Step 3 — robots.txt middleware (src/RobotsMiddleware.hs)

A guard on rawPathInfo intercepts the path before botBlockerMiddleware runs. All crawlers — including blocked AI bots — can always read robots.txt.

-- src/RobotsMiddleware.hs — serve robots.txt before bot-blocker runs

module RobotsMiddleware (robotsTxtMiddleware) where

import qualified Data.ByteString.Lazy.Char8 as BLC
import           Network.HTTP.Types          (status200)
import           Network.Wai                 (Middleware, rawPathInfo,
                                              responseLBS)

robotsTxt :: BLC.ByteString
robotsTxt = BLC.pack $ unlines
  [ "User-agent: *"
  , "Allow: /"
  , ""
  , "User-agent: GPTBot"
  , "Disallow: /"
  , ""
  , "User-agent: ClaudeBot"
  , "Disallow: /"
  , ""
  , "User-agent: CCBot"
  , "Disallow: /"
  , ""
  , "User-agent: Bytespider"
  , "Disallow: /"
  , ""
  , "User-agent: Google-Extended"
  , "Disallow: /"
  , ""
  , "User-agent: PerplexityBot"
  , "Disallow: /"
  , ""
  , "User-agent: Meta-ExternalAgent"
  , "Disallow: /"
  ]

-- robotsTxtMiddleware: intercepts /robots.txt requests before any other
-- middleware runs. All crawlers — including AI bots — can read it.
robotsTxtMiddleware :: Middleware
robotsTxtMiddleware innerApp req respond
  | rawPathInfo req == "/robots.txt" =
      respond $ responseLBS
        status200
        [("Content-Type", "text/plain; charset=utf-8")]
        robotsTxt
  | otherwise = innerApp req respond

Step 4 — Compose and run (app/Main.hs)

Middleware composes with (.) — standard Haskell function composition. Execution order follows function application: leftmost middleware runs first on the way in.run port app starts the Warp HTTP server.

-- app/Main.hs — compose middleware and start Warp

module Main where

import Network.Wai.Handler.Warp (run)
import BotBlocker               (botBlockerMiddleware)
import RobotsMiddleware          (robotsTxtMiddleware)
import App                       (application)  -- your WAI Application

main :: IO ()
main = do
  let port = 8080
  -- Compose middleware with function application (right-to-left).
  -- Execution order is left-to-right: robotsTxtMiddleware fires first,
  -- then botBlockerMiddleware, then the inner Application.
  let app = robotsTxtMiddleware . botBlockerMiddleware $ application
  putStrLn $ "Listening on port " <> show port
  run port app

-- app/App.hs — minimal WAI Application example
-- module App (application) where
--
-- import Network.HTTP.Types  (status200)
-- import Network.Wai         (Application, responseLBS)
-- import qualified Data.ByteString.Lazy.Char8 as BLC
--
-- application :: Application
-- application _req respond =
--   respond $ responseLBS status200
--     [("Content-Type", "text/html; charset=utf-8")]
--     "<html><head><meta name=\"robots\" content=\"noai, noimageai\"></head><body>Welcome</body></html>"

Step 5 — Servant, Scotty, and Yesod integration

Every Haskell web framework on WAI exposes an Application value. The middleware wraps it unchanged — zero framework-specific modifications needed.

-- Integrating with Servant, Scotty, and Yesod
-- The middleware is framework-agnostic — all produce a WAI Application.

-- ---- Servant ----
-- import Servant
-- import Network.Wai.Handler.Warp (run)
-- import BotBlocker (botBlockerMiddleware)
-- import RobotsMiddleware (robotsTxtMiddleware)
--
-- type API = "hello" :> Get '[PlainText] Text
--
-- server :: Server API
-- server = return "Hello"
--
-- main :: IO ()
-- main = do
--   let app = robotsTxtMiddleware
--           . botBlockerMiddleware
--           $ serve (Proxy :: Proxy API) server
--   run 8080 app

-- ---- Scotty ----
-- import Web.Scotty (scottyApp, get, text)
-- import Network.Wai.Handler.Warp (run)
-- import BotBlocker (botBlockerMiddleware)
-- import RobotsMiddleware (robotsTxtMiddleware)
--
-- main :: IO ()
-- main = do
--   app <- scottyApp $ do
--     get "/" $ text "Hello"
--   run 8080 (robotsTxtMiddleware . botBlockerMiddleware $ app)

-- ---- Yesod ----
-- import Yesod
-- import Network.Wai.Handler.Warp (run)
-- import BotBlocker (botBlockerMiddleware)
-- import RobotsMiddleware (robotsTxtMiddleware)
--
-- data MyApp = MyApp
-- mkYesod "MyApp" [parseRoutes| / HomeR GET |]
-- instance Yesod MyApp
-- getHomeR :: Handler Html
-- getHomeR = defaultLayout [whamlet|<h1>Hello|]
--
-- main :: IO ()
-- main = do
--   app <- toWaiApp MyApp
--   run 8080 (robotsTxtMiddleware . botBlockerMiddleware $ app)

.cabal dependencies

-- my-app.cabal — dependencies

cabal-version:       3.0
name:                my-app
version:             0.1.0.0

executable my-app
  main-is:           Main.hs
  hs-source-dirs:    app, src
  build-depends:
      base           >= 4.14 && < 5
    , wai            >= 3.2  && < 3.3
    , warp           >= 3.3  && < 3.4
    , http-types     >= 0.12 && < 0.13
    , bytestring     >= 0.11 && < 0.13
    , case-insensitive >= 1.2 && < 1.3
  default-language:  Haskell2010
  default-extensions:
    OverloadedStrings

WAI vs Servant vs Scotty vs Yesod

FeatureWAI / WarpServantScottyYesod
Middleware typetype Middleware = Application -> Application — pure function compositionSame WAI Middleware — Servant produces an Application, wrap itSame WAI Middleware — scottyApp returns IO Application, wrap after extractionSame WAI Middleware — toWaiApp returns IO Application, wrap after extraction
Short-circuitCall respond with responseLBS status403 — innerApp never calledIdentical — middleware wraps the Servant ApplicationIdentical — middleware wraps the Scotty ApplicationIdentical — or use Yesod's built-in isAuthorized for auth-level checks
UA header accesslookup hUserAgent (requestHeaders req) :: Maybe ByteString — CI key, Maybe safeSame WAI requestHeaders in middleware; Servant Header combinator in handlersSame WAI in middleware; header "User-Agent" :: ActionM (Maybe Text) in handlersSame WAI in middleware; lookupHeader "User-Agent" :: HandlerFor site (Maybe ByteString)
CI ByteStringCI.mk "user-agent" — case-insensitive ByteString for header name lookupSame — WAI headers use CI ByteString keys universallySame at WAI layer; Scotty handler API uses TextSame at WAI layer; Yesod handler API uses Text
robots.txtrawPathInfo req == "/robots.txt" check in robotsTxtMiddleware before bot-blockerSame WAI middleware approach, or add Raw endpoint in Servant APISame WAI middleware approach, or get "/robots.txt" handler in ScottyMSame WAI middleware approach, or StaticR via yesod-static
Compositionf . g $ app — standard Haskell function composition, right-to-left wrappingSame — serve api server gives Application, wrap freelySame — scottyApp returns IO Application, compose afterSame — toWaiApp returns IO Application, compose after

Summary

  • Middleware = Application -> Application — call respond with a 403 to short-circuit; call innerApp req respond' to pass through.
  • CI ByteString keys — WAI encodes case-insensitive header names in the type. lookup hUserAgent headers matches User-Agent, user-agent, and all capitalisations.
  • mapResponseHeaders — prepends to the response header list without rebuilding the response body. Use it to add X-Robots-Tag on pass-through.
  • Framework-agnostic — Servant, Scotty, Yesod all expose a WAI Application. Wrap once, deploy anywhere.
  • Middleware composition with (.) robotsTxtMiddleware . botBlockerMiddleware $ app — leftmost fires first. Standard Haskell function composition.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.