Skip to content

How to Block AI Bots in Scala Scalatra

Scalatra is a Sinatra-inspired Scala web framework built on the Java Servlet API, running on embedded Jetty. It is lightweight and direct — routes are defined as closures, and a before() filter fires before every route handler. The Scalatra-specific detail for bot blocking: short-circuiting is done with halt(), a DSL method that throws a HaltException internally — Scalatra catches it and renders the provided status and body. Because halt() throws, no return is needed after it. Header access uses the underlying servlet API: request.getHeader() returns null when absent — wrap it in Option().getOrElse("") for idiomatic Scala null safety.

1. Bot detection object

A Scala singleton object with no external dependencies. String.contains() performs literal substring matching. toLowerCase is applied once before the exists check — no regex engine involved.

// src/main/scala/com/example/AiBotDetector.scala
package com.example

object AiBotDetector {

  // All lowercase — matched against ua.toLowerCase
  private val patterns: Seq[String] = Seq(
    "gptbot",
    "chatgpt-user",
    "claudebot",
    "anthropic-ai",
    "ccbot",
    "google-extended",
    "cohere-ai",
    "meta-externalagent",
    "bytespider",
    "omgili",
    "diffbot",
    "imagesiftbot",
    "magpie-crawler",
    "amazonbot",
    "dataprovider",
    "netcraft"
  )

  def isAiBot(ua: String): Boolean = {
    if (ua == null || ua.isEmpty) return false
    val lower = ua.toLowerCase
    // String.contains() — literal substring, no regex
    patterns.exists(lower.contains)
  }
}

2. ScalatraServlet with before() filter

Extend ScalatraServlet and add a before() block. request.getHeader() returns null when absent — use Option(...).getOrElse(""). Set response headers on the response object before calling halt() — headers must be set before the exception is thrown.

// src/main/scala/com/example/MyServlet.scala
package com.example

import org.scalatra._

class MyServlet extends ScalatraServlet {

  // ── Global before() filter ────────────────────────────────────────────────
  // Fires before every route handler in this servlet.
  // halt() stops execution immediately — no return needed after it.
  before() {
    // Path guard: let robots.txt through.
    if (request.getPathInfo == "/robots.txt") {
      return  // pass through — continue to route handler
    }

    // request.getHeader() returns null when the header is absent.
    // Option().getOrElse() converts null to empty string safely.
    val ua: String = Option(request.getHeader("User-Agent")).getOrElse("")

    if (AiBotDetector.isAiBot(ua)) {
      // Set response header on the blocked response
      response.setHeader("X-Robots-Tag", "noai, noimageai")
      response.setContentType("text/plain")
      // halt() throws HaltException — Scalatra catches it and renders
      // the provided status and body. Code after halt() does not run.
      halt(403, "Forbidden")
    }

    // Pass-through: inject X-Robots-Tag on all non-blocked responses
    response.setHeader("X-Robots-Tag", "noai, noimageai")
  }

  // ── Routes ─────────────────────────────────────────────────────────────────
  get("/") {
    contentType = "application/json"
    """{"message": "Hello"}"""
  }

  get("/api/data") {
    contentType = "application/json"
    """{"data": "value"}"""
  }

  get("/robots.txt") {
    contentType = "text/plain"
    """User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /"""
  }
}

3. ScalatraBootstrap — servlet registration

Scalatra discovers ScalatraBootstrap by convention at startup. Register servlets with context.mount(new MyServlet, "/*"). Multiple servlets can be mounted at different path prefixes.

// src/main/scala/com/example/ScalatraBootstrap.scala
// LifeCycle class — registers servlets with the container.
// Scalatra discovers this class by convention at startup.
import com.example._
import org.scalatra._
import javax.servlet.ServletContext

class ScalatraBootstrap extends LifeCycle {
  override def init(context: ServletContext): Unit = {
    // Mount the servlet at all paths
    context.mount(new MyServlet, "/*")
  }

  override def destroy(context: ServletContext): Unit = {
    // Cleanup — close DB connections, stop scheduled tasks
  }
}

4. Embedded Jetty launcher

Scalatra is typically run with embedded Jetty for standalone deployment. The ScalatraListener discovers ScalatraBootstrap and mounts all registered servlets. Build a fat JAR with sbt assembly and run directly.

// src/main/scala/com/example/JettyLauncher.scala
// Embedded Jetty server — run as a standalone application.
import org.eclipse.jetty.server.Server
import org.eclipse.jetty.webapp.WebAppContext
import org.scalatra.servlet.ScalatraListener

object JettyLauncher extends App {
  val port = sys.env.getOrElse("PORT", "8080").toInt
  val server = new Server(port)

  val context = new WebAppContext()
  context.setContextPath("/")
  // Point to the webapp directory (contains WEB-INF/web.xml)
  context.setResourceBase("src/main/webapp")
  // Scalatra's listener discovers ScalatraBootstrap and mounts servlets
  context.addEventListener(new ScalatraListener)
  context.addServlet(classOf[org.eclipse.jetty.servlet.DefaultServlet], "/")

  server.setHandler(context)
  server.start()
  server.join()
}

5. Allow-list path guard

When multiple paths should bypass the filter, use a Set allow-list and a startsWith check for path prefixes. Cleaner than repeated || conditions in the guard.

// Exclude multiple paths from the bot filter using pattern matching
before() {
  val path = request.getPathInfo

  // Allow list: paths that bypass bot blocking
  val allowed = Set("/robots.txt", "/health", "/favicon.ico")
  if (allowed.contains(path) || path.startsWith("/public/")) {
    return
  }

  val ua = Option(request.getHeader("User-Agent")).getOrElse("")
  if (AiBotDetector.isAiBot(ua)) {
    response.setHeader("X-Robots-Tag", "noai, noimageai")
    halt(403, "Forbidden")
  }

  response.setHeader("X-Robots-Tag", "noai, noimageai")
}

6. halt() with JSON body — API variant

For API-only servlets, pass a Map to halt() — Scalatra serialises it to JSON when JSON format is active (requires scalatra-json + Jackson or json4s). The body named argument accepts any type that Scalatra knows how to render.

// halt() with a map body — Scalatra renders it as JSON when format is json
// Useful for API-only servlets where clients expect JSON error responses.

before() {
  if (request.getPathInfo == "/robots.txt") return

  val ua = Option(request.getHeader("User-Agent")).getOrElse("")
  if (AiBotDetector.isAiBot(ua)) {
    response.setHeader("X-Robots-Tag", "noai, noimageai")
    // halt with a Map — Scalatra serialises to JSON if JSON format is active
    halt(403, body = Map("error" -> "Forbidden", "status" -> 403))
  }

  response.setHeader("X-Robots-Tag", "noai, noimageai")
}

7. web.xml — static robots.txt via DefaultServlet

Configure Jetty's DefaultServlet to handle /robots.txt before Scalatra. When configured this way, the before() filter never fires for it — the path guard is a safety net for other configurations.

<!-- src/main/webapp/WEB-INF/web.xml -->
<!-- Minimal web.xml for Scalatra with embedded Jetty -->
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
           http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
         version="3.1">

  <listener>
    <listener-class>org.scalatra.servlet.ScalatraListener</listener-class>
  </listener>

  <!-- Optional: serve static files (including robots.txt) via DefaultServlet
       before Scalatra handles the request -->
  <servlet>
    <servlet-name>default</servlet-name>
    <servlet-class>org.eclipse.jetty.servlet.DefaultServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>default</servlet-name>
    <url-pattern>/robots.txt</url-pattern>
  </servlet-mapping>
</web-app>

8. build.sbt

// build.sbt
val ScalatraVersion = "2.8.4"

lazy val root = project
  .in(file("."))
  .settings(
    name         := "my-scalatra-app",
    scalaVersion := "2.13.12",
    libraryDependencies ++= Seq(
      "org.scalatra" %% "scalatra"         % ScalatraVersion,
      "org.scalatra" %% "scalatra-json"    % ScalatraVersion,  // optional JSON support
      "ch.qos.logback" % "logback-classic" % "1.4.14" % Runtime,
      // Embedded Jetty
      "org.eclipse.jetty" % "jetty-webapp"  % "9.4.53.v20231009" % Container,
      "javax.servlet"     % "javax.servlet-api" % "3.1.0"        % Provided
    ),
    // sbt-revolver for hot reload in dev
  )

Key points

Framework comparison — Scala web frameworks

FrameworkHook / filterBlock callUA header
Scalatrabefore() { }halt(403, "Forbidden")Option(request.getHeader("User-Agent")).getOrElse("")
Play FrameworkEssentialFilter / ActionFilterreturn Future(Forbidden("..."))request.headers.get("User-Agent")
http4sHttpRoutes.of middlewarereturn Forbidden("...") purereq.headers.get["User-Agent"]
ZIO HTTPMiddleware compositionreturn ZIO.succeed(Response.status(Status.Forbidden))req.header(Header.UserAgent)

Scalatra is the most imperative of the Scala frameworks — mutable response object, halt() via exception, servlet API for headers. Play, http4s, and ZIO HTTP are all functional and return-value-based. Scalatra is the right choice for teams coming from Java servlets or Sinatra who want a familiar, low-ceremony API in Scala.

Dependencies

# Run with sbt
sbt run

# Build fat JAR
sbt assembly

# Run fat JAR
java -jar target/scala-2.13/my-scalatra-app-assembly-0.1.0-SNAPSHOT.jar

# Hot reload in development
sbt "~;jetty:stop;jetty:start"   # with xsbt-web-plugin

# Scalatra version support
# Scalatra 2.8.x — Scala 2.12 / 2.13, Jetty 9.x, javax.servlet
# Scalatra 3.x   — Scala 3, Jetty 11+, jakarta.servlet (Jakarta EE 9+)