Skip to content

How to Block AI Bots in Akka HTTP

Akka HTTP is Lightbend's Scala HTTP toolkit built on Akka Streams. Its routing DSL is purely functional — routes are values composed with ~, and cross-cutting logic is expressed as directives. The correct primitive for a bot-blocking directive is mapInnerRoute combined with optionalHeaderValueByName. optionalHeaderValueByName("User-Agent") returns Option[String] None when absent, Some(value) when present. The robots.txt route must precede botBlocker in the route tree; if it sits inside the directive, AI bots hit the 403 before they can read Disallow rules. The same patterns apply to Pekko HTTP (the Apache community fork) with only package name changes.

1. Bot detection

Pure Scala singleton object, no external dependencies. Lowercase the User-Agent string once, then test each known pattern with String.contains.

// AiBotDetector.scala — pure Scala, no dependencies
package middleware

object AiBotDetector {
  // Known AI crawler User-Agent substrings, all lowercase.
  private val patterns: List[String] = List(
    "gptbot",
    "chatgpt-user",
    "claudebot",
    "anthropic-ai",
    "ccbot",
    "google-extended",
    "cohere-ai",
    "meta-externalagent",
    "bytespider",
    "omgili",
    "diffbot",
    "imagesiftbot",
    "magpie-crawler",
    "amazonbot",
    "dataprovider",
    "netcraft",
  )

  /** Returns true when userAgent matches a known AI crawler.
   *  Safe on an empty string — no special-casing required.
   */
  def isAiBot(userAgent: String): Boolean =
    if (userAgent.isEmpty) false
    else {
      val lower = userAgent.toLowerCase
      patterns.exists(lower.contains)
    }
}

2. Directive and server setup

mapInnerRoute is a Directive0 constructor that wraps the inner Route. Inside, optionalHeaderValueByName extracts the UA as Option[String]. On a match, complete(Forbidden, ...) short-circuits. On a pass, mapResponseHeaders appends X-Robots-Tag before invoking the inner route. The robots.txt route sits outside and before botBlocker.

// Main.scala — Akka HTTP server with AI bot blocking
// build.sbt:
//   "com.typesafe.akka" %% "akka-http"   % "10.5.3"
//   "com.typesafe.akka" %% "akka-stream" % "2.8.5"

package server

import akka.actor.typed.ActorSystem
import akka.actor.typed.scaladsl.Behaviors
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.StatusCodes
import akka.http.scaladsl.model.headers.RawHeader
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server.{Directive0, Route}
import middleware.AiBotDetector

import scala.concurrent.ExecutionContextExecutor
import scala.io.StdIn

// ── Bot-blocking directive ──────────────────────────────────────────────────
//
// mapInnerRoute receives the inner Route and returns a new Route.
// This is the correct primitive when you need to either:
//   • short-circuit — complete(StatusCodes.Forbidden, ...)
//   • pass through  — mapResponseHeaders(...)(inner)
//
// optionalHeaderValueByName("User-Agent") extracts the UA header as
// Option[String] — None when absent, Some(value) when present.
// .getOrElse("") converts to String safely; no null check required.
//
// NOTE: complete(StatusCodes.Forbidden, headers, body) uses Akka HTTP's
// overload that accepts (StatusCode, Seq[HttpHeader], String). Akka HTTP
// infers Content-Type: text/plain; charset=UTF-8 for the String body.

def botBlocker: Directive0 = mapInnerRoute { inner =>
  optionalHeaderValueByName("User-Agent") { uaOpt =>
    val ua = uaOpt.getOrElse("")

    if (AiBotDetector.isAiBot(ua)) {
      // Block: respond with 403 + X-Robots-Tag. Do NOT invoke inner.
      complete(
        StatusCodes.Forbidden,
        List(RawHeader("X-Robots-Tag", "noai, noimageai")),
        "Forbidden: AI crawlers are not permitted",
      )
    } else {
      // Pass: append X-Robots-Tag to the response, then run inner route.
      // mapResponseHeaders returns a Directive0 — apply it to inner.
      mapResponseHeaders(_ :+ RawHeader("X-Robots-Tag", "noai, noimageai"))(inner)
    }
  }
}

// ── Robots.txt ──────────────────────────────────────────────────────────────

val robotsTxt: String =
  """User-agent: *
    |Allow: /
    |
    |User-agent: GPTBot
    |Disallow: /
    |
    |User-agent: ClaudeBot
    |Disallow: /
    |
    |User-agent: CCBot
    |Disallow: /
    |
    |User-agent: Google-Extended
    |Disallow: /""".stripMargin

// ── Route tree ──────────────────────────────────────────────────────────────
//
// CRITICAL: the robots.txt route MUST appear BEFORE botBlocker in the
// route tree. Akka HTTP tries alternatives left-to-right with ~.
// If /robots.txt sits inside botBlocker, AI bots hit the 403 before
// they can read the Disallow rules — defeating the purpose.

val route: Route =
  path("robots.txt") {                         // ← always allow
    get { complete(robotsTxt) }
  } ~
  botBlocker {                                 // ← all other routes protected
    pathSingleSlash {
      get { complete("""{"message":"ok"}""") }
    } ~
    path("api" / "data") {
      get { complete("""{"data":"value"}""") }
    }
  }

// ── Server startup ───────────────────────────────────────────────────────────

object Main extends App {
  implicit val system: ActorSystem[Nothing] =
    ActorSystem(Behaviors.empty, "ai-bot-blocker")
  implicit val ec: ExecutionContextExecutor = system.executionContext

  val bindingFuture = Http().newServerAt("0.0.0.0", 8080).bind(route)
  println("Server online at http://0.0.0.0:8080/ — press RETURN to stop")
  StdIn.readLine()
  bindingFuture.flatMap(_.unbind()).onComplete(_ => system.terminate())
}

3. Scoped protection — /api/* only

Wrap only the pathPrefix("api") subtree in botBlocker. Routes at the root remain unprotected.

// Scope bot blocking to /api/* only — public routes are unprotected.
//
// pathPrefix("api") matches any path starting with /api.
// botBlocker is applied only inside that prefix.

val route: Route =
  path("robots.txt") {
    get { complete(robotsTxt) }
  } ~
  pathSingleSlash {
    // Public — no bot blocking
    get { complete("""{"message":"ok"}""") }
  } ~
  pathPrefix("api") {
    botBlocker {                           // only /api/* is protected
      path("data") {
        get { complete("""{"data":"value"}""") }
      } ~
      path("users") {
        get { complete("""{"users":[]}""") }
      }
    }
  }

4. Pekko HTTP (Apache fork)

Pekko HTTP is the Apache-licensed community fork created after Lightbend moved Akka to BSL in September 2022. The directive API is word-for-word identical — only import paths change.

// Pekko HTTP — Apache community fork of Akka HTTP (BSL-free).
// API is identical; only package names differ.
//
// Replace in build.sbt:
//   "com.typesafe.akka" %% "akka-http"   → "org.apache.pekko" %% "pekko-http"   % "1.1.0"
//   "com.typesafe.akka" %% "akka-stream" → "org.apache.pekko" %% "pekko-stream" % "1.1.0"
//
// Replace in source imports:
//   akka.actor.typed        → org.apache.pekko.actor.typed
//   akka.http.scaladsl      → org.apache.pekko.http.scaladsl
//   akka.http.scaladsl.model → org.apache.pekko.http.scaladsl.model
//
// The botBlocker directive, route DSL, and all directive combinators
// (mapInnerRoute, optionalHeaderValueByName, mapResponseHeaders) are
// word-for-word identical between Akka HTTP and Pekko HTTP.

import org.apache.pekko.actor.typed.ActorSystem
import org.apache.pekko.actor.typed.scaladsl.Behaviors
import org.apache.pekko.http.scaladsl.Http
import org.apache.pekko.http.scaladsl.model.StatusCodes
import org.apache.pekko.http.scaladsl.model.headers.RawHeader
import org.apache.pekko.http.scaladsl.server.Directives._
import org.apache.pekko.http.scaladsl.server.{Directive0, Route}

// botBlocker implementation is identical to the Akka HTTP version above.

Key points

Framework comparison — JVM HTTP middleware models

FrameworkMiddleware modelBlock requestHeader access
Akka HTTPDirective0 / mapInnerRoutecomplete(Forbidden, headers, body)optionalHeaderValueByName Option[String]
Spring BootOncePerRequestFilter / @WebFilterresponse.sendError(403); skip doFilterrequest.getHeader()null absent
Vert.x WebRoute.handler() — event-loop chainctx.fail(403); skip ctx.next()ctx.request().getHeader() null
Play FrameworkEssentialFilter / ActionBuilderReturn Future(Forbidden(...)) without calling nextreq.headers.get() Option[String]

Akka HTTP and Play Framework both use Option[String] for absent headers, making null-safety a compile-time guarantee. Spring Boot and Vert.x return null for missing headers — always guard with a null check. Akka HTTP's directive model is the most compositional: directives are first-class values that can be combined, tested in isolation, and reused across route trees.