Skip to content
Guides/http4s

How to Block AI Bots on http4s (Scala): Complete 2026 Guide

http4s is the purely functional Scala HTTP library — built on Cats Effect and Fs2. Routes are HttpRoutes[F], a Kleisli from Request[F] to OptionT[F, Response[F]]. Middleware is a function HttpRoutes[F] => HttpRoutes[F] — wrap routes in a new Kleisli that checks the User-Agent and short-circuits without calling the inner routes for AI bots.

Short-circuit = don't call inner routes

In the bot-blocking middleware, returning OptionT.some(forbiddenResponse) immediately — without calling routes(req) — means the inner HttpRoutes[F] never executes. No pattern matching, no database calls, no template rendering. For legitimate requests, call routes(req).map(_.putHeaders(...)) to add the X-Robots-Tag to every response in one place.

Protection layers

1
robots.txtStaticFile.fromPath, FileService, or inline Ok() route — placed outside BotBlockerMiddleware in composition
2
noai meta tagIn HTML string responses or Twirl/Scalatags templates
3
X-Robots-Tag headerroutes(req).map(_.putHeaders(xRobotsTag)) — appended to all legitimate responses in middleware
4
Hard 403 — global compositionBotBlockerMiddleware(robotsRoutes <+> appRoutes) — applies to all protected routes
5
Hard 403 — scoped composition(publicRoutes <+> BotBlockerMiddleware(apiRoutes)).orNotFound — only /api/* is protected

Step 1 — Bot list (AiBots.scala)

A plain List[String] — immutable, JVM-allocated once at startup. exists short-circuits on the first match. toLowerCase before checking handles all User-Agent capitalisation variants.

// AiBots.scala — shared bot detection

package myapp

object AiBots {
  private val patterns: List[String] = List(
    // OpenAI
    "gptbot", "chatgpt-user", "oai-searchbot",
    // Anthropic
    "claudebot", "claude-web",
    // Common Crawl
    "ccbot",
    // Bytedance
    "bytespider",
    // Meta
    "meta-externalagent",
    // Perplexity
    "perplexitybot",
    // Google AI
    "google-extended", "googleother",
    // Cohere
    "cohere-ai",
    // Amazon
    "amazonbot",
    // Diffbot
    "diffbot",
    // AI2
    "ai2bot",
    // DeepSeek
    "deepseekbot",
    // Mistral
    "mistralai-user",
    // xAI
    "xai-bot",
    // You.com
    "youbot",
    // DuckDuckGo AI
    "duckassistbot",
  )

  def isAiBot(userAgent: String): Boolean = {
    val ua = userAgent.toLowerCase
    patterns.exists(ua.contains)
  }
}

Step 2 — Bot-blocking middleware (Kleisli)

The middleware takes routes: HttpRoutes[F] and returns a new Kleisli. Inside, check the User-Agent using CIString — it is case-insensitive and matches regardless of how the client capitalised the header name.

// BotBlockerMiddleware.scala — http4s Kleisli middleware

package myapp

import cats.Monad
import cats.data.{Kleisli, OptionT}
import cats.syntax.all.*
import org.http4s.*
import org.http4s.Status.Forbidden
import org.typelevel.ci.CIString

object BotBlockerMiddleware {

  private val xRobotsTag = Header.Raw(
    CIString("X-Robots-Tag"),
    "noai, noimageai",
  )

  /** Middleware that wraps HttpRoutes[F] and blocks AI bots with 403.
   *
   * Type: HttpRoutes[F] => HttpRoutes[F]
   * HttpRoutes[F] = Kleisli[OptionT[F, *], Request[F], Response[F]]
   *
   * For AI bots: returns OptionT.some(403 response) — inner routes never called.
   * For legit:   calls inner routes(req) and appends X-Robots-Tag header.
   */
  def apply[F[_]: Monad](routes: HttpRoutes[F]): HttpRoutes[F] =
    Kleisli { (req: Request[F]) =>
      // CIString is case-insensitive — matches "User-Agent", "user-agent", etc.
      val ua = req.headers
        .get(CIString("User-Agent"))
        .map(_.head.value)
        .getOrElse("")

      if (AiBots.isAiBot(ua)) {
        // Short-circuit — inner routes never run
        OptionT.some[F](
          Response[F](Forbidden)
            .putHeaders(
              xRobotsTag,
              Header.Raw(CIString("Content-Type"), "text/plain; charset=utf-8"),
            )
            .withEntity("Forbidden"),
        )
      } else {
        // Pass through and add X-Robots-Tag to all legitimate responses
        routes(req).map(_.putHeaders(xRobotsTag))
      }
    }
}

Step 3 — Route definitions and <+> composition

Place robotsRoutes and /health outside the bot-blocker. Compose with <+> (SemigroupK — first match wins) then call .orNotFound to produce the final HttpApp[F] required by the server.

// AppRoutes.scala — route definitions and composition

package myapp

import cats.effect.{Concurrent, IO}
import org.http4s.*
import org.http4s.dsl.io.*
import org.http4s.syntax.all.*
import org.typelevel.ci.CIString

object AppRoutes {

  // robots.txt — must be OUTSIDE the bot-blocker so crawlers can read it
  val robotsRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
    case GET -> Root / "robots.txt" =>
      Ok(
        """User-agent: *
          |Allow: /
          |User-agent: GPTBot
          |Disallow: /
          |User-agent: ClaudeBot
          |Disallow: /
          |User-agent: CCBot
          |Disallow: /
          |User-agent: Bytespider
          |Disallow: /
          |User-agent: Google-Extended
          |Disallow: /
          |User-agent: PerplexityBot
          |Disallow: /
          |User-agent: Meta-ExternalAgent
          |Disallow: /
          |User-agent: AmazonBot
          |Disallow: /
          |""".stripMargin,
        "text/plain",
      )

    case GET -> Root / "health" =>
      Ok("ok")
  }

  // Protected routes — wrapped in BotBlockerMiddleware
  val protectedRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
    case GET -> Root =>
      Ok(
        """<!DOCTYPE html>
          |<html><head>
          |  <meta name="robots" content="noai, noimageai">
          |  <title>My Site</title>
          |</head><body><h1>Welcome</h1></body></html>
          |""".stripMargin,
        "text/html",
      )

    case GET -> Root / "api" / "data" =>
      Ok("""{"data": "protected"}""", "application/json")
  }

  // Compose: robots (unblocked) + protected (bot-blocked)
  // <+> is SemigroupK.combine — tries routes in order, first match wins
  val app: HttpApp[IO] =
    (robotsRoutes <+> BotBlockerMiddleware(protectedRoutes)).orNotFound
}

Step 4 — EmberServer setup (build.sbt)

EmberServer is the recommended http4s server backend — pure Scala, HTTP/2, and WebSocket support. Extend IOApp.Simple for the entry point; Cats Effect handles the runtime.

// Main.scala — EmberServer setup (http4s + Cats Effect)

package myapp

import cats.effect.{IO, IOApp}
import com.comcast.ip4s.*
import org.http4s.ember.server.EmberServerBuilder

object Main extends IOApp.Simple {

  override def run: IO[Unit] =
    EmberServerBuilder
      .default[IO]
      .withHost(ipv4"0.0.0.0")
      .withPort(port"8080")
      .withHttpApp(AppRoutes.app)
      .build
      .useForever
}

// build.sbt — required dependencies
// scalaVersion := "3.4.0"
//
// libraryDependencies ++= Seq(
//   "org.http4s"    %% "http4s-ember-server"  % "0.23.27",
//   "org.http4s"    %% "http4s-ember-client"  % "0.23.27",
//   "org.http4s"    %% "http4s-dsl"           % "0.23.27",
//   "org.typelevel" %% "cats-effect"          % "3.5.4",
//   "org.typelevel" %% "log4cats-noop"        % "2.7.0",
// )

Step 5 — robots.txt via StaticFile or FileService

StaticFile.fromPath streams a file from the filesystem without loading it into memory — efficient for large files. FileService serves an entire directory. Both integrate as HttpRoutes[F] and compose with <+> like any other routes.

// Serving robots.txt from the filesystem with StaticFile

import cats.effect.IO
import fs2.io.file.Path
import org.http4s.*
import org.http4s.dsl.io.*
import org.http4s.server.staticcontent.{FileService, fileService}

// Option A: StaticFile — single file from path
val robotsRoute: HttpRoutes[IO] = HttpRoutes.of[IO] {
  case req @ GET -> Root / "robots.txt" =>
    StaticFile
      .fromPath[IO](Path("src/main/resources/robots.txt"), Some(req))
      .getOrElseF(NotFound())
}

// Option B: FileService — serve an entire directory
// Serves all files under src/main/resources/public/ at their path names
val staticRoutes: HttpRoutes[IO] =
  fileService[IO](FileService.Config("src/main/resources/public"))

// Option C: Inline string (compile-time embedded)
val robotsInline: HttpRoutes[IO] = HttpRoutes.of[IO] {
  case GET -> Root / "robots.txt" =>
    Ok(ROBOTS_TXT_CONTENT, "text/plain; charset=utf-8")
}

private val ROBOTS_TXT_CONTENT = """User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
"""

Step 6 — Scoped protection (public routes + bot-blocked API)

Apply BotBlockerMiddleware only to the routes that need protection. Unprotected routes (health, robots.txt, home page) compose with <+> before the protected group.

// Scoped middleware — protect only /api/* routes

import cats.effect.IO
import org.http4s.*
import org.http4s.dsl.io.*
import org.http4s.syntax.all.*

// Public routes — no bot-blocker
val publicRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
  case GET -> Root / "health" => Ok("ok")
  case GET -> Root / "robots.txt" => Ok(ROBOTS_TXT_CONTENT, "text/plain")
  case GET -> Root => Ok("<html>...</html>", "text/html")
}

// API routes — wrapped individually in bot-blocker
val apiRoutes: HttpRoutes[IO] = HttpRoutes.of[IO] {
  case GET -> Root / "api" / "data"  => Ok("""{"data":"protected"}""", "application/json")
  case GET -> Root / "api" / "users" => Ok("[]", "application/json")
}

// Compose: public (unblocked) + api (bot-blocked)
// The <+> combinator tries routes in order — first match wins.
val app: HttpApp[IO] =
  (publicRoutes <+> BotBlockerMiddleware(apiRoutes)).orNotFound

http4s vs Play Framework vs Akka HTTP vs ZIO HTTP

Featurehttp4sPlay FrameworkAkka HTTPZIO HTTP
Middleware typeHttpRoutes[F] => HttpRoutes[F] (Kleisli composition)EssentialFilter: RequestHeader + raw bytes → short-circuit before bodyDirective[T]: composable extractor and transformer, reject() to blockHttpMiddleware[R, E] = Http[R, E, Request, Response] => Http[R, E, Request, Response]
Short-circuitOptionT.some(Response[F](Status.Forbidden)) — inner routes never calledAccumulator.done(Results.Forbidden) — body never readreject(ValidationRejection("AI bot")) or complete(StatusCodes.Forbidden)ZIO.succeed(Response.status(Status.Forbidden)) without calling next
Route composition<+> (SemigroupK) combines routes; first match winsRouter.orElse or Action compositionDirective concatenation with ~ or path matchingHttp.collect routes composed with ++
Header accessreq.headers.get(CIString("User-Agent")).map(_.head.value)request.headers.get("User-Agent")optionalHeaderValueByName("User-Agent")request.header(Header.UserAgent).map(_.renderedValue)
robots.txtStaticFile.fromPath or FileService or inline Ok() handlerPlug.Static equivalent via Assets controller or explicit ActiongetFromFile("robots.txt") directiveHttp.fromFile or explicit route handler
Effect typeF[_]: Concurrent — tagless final, works with IO, ZIO, MonixFuture[Result] or Action[A]Future[T] — Akka actor systemZIO[R, E, A] — environment, error, value
HTTP serverEmberServerBuilder (default), BlazeServer, or JettyBuilderNetty (default) or Akka HTTPAkka HTTP (Netty-based)ZIO HTTP built-in server

Summary

  • Kleisli middleware = HttpRoutes[F] => HttpRoutes[F] — wrap inner routes; for AI bots return OptionT.some(403) directly.
  • CIString for header names — case-insensitive by spec. Always wrap header name strings in CIString(...).
  • <+> composition — place robots.txt and health routes before bot-blocker in the chain.
  • .map(_.putHeaders(xRobotsTag)) — adds X-Robots-Tag to all legitimate responses in one place.
  • .orNotFound — converts HttpRoutes[F] (can return None) to HttpApp[F] (always returns a response) — required by the server builder.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.