Skip to content

How to Block AI Bots in SparkJava

SparkJava is a lightweight Sinatra-inspired Java framework built on Jetty. Bot blocking uses the before() filter — a lambda that runs before every route handler. request.headers("User-Agent") is case-insensitive (wraps Jetty's HttpServletRequest.getHeader() which is case-insensitive per the HTTP specification). halt(403, "Forbidden") throws a HaltException — Spark catches it and sends the response immediately; code after halt() is unreachable. A plain return passes the request through.

1. Bot detection

Pure Java, no dependencies. Stream.anyMatch() short-circuits on first match. String.contains() for literal substring matching. Null-safe: returns false for null or empty input.

// AiBotDetector.java — AI bot detection, no external dependencies
import java.util.List;

public class AiBotDetector {

    private static final List<String> AI_BOT_PATTERNS = List.of(
        "gptbot",
        "chatgpt-user",
        "claudebot",
        "anthropic-ai",
        "ccbot",
        "google-extended",
        "cohere-ai",
        "meta-externalagent",
        "bytespider",
        "omgili",
        "diffbot",
        "imagesiftbot",
        "magpie-crawler",
        "amazonbot",
        "dataprovider",
        "netcraft"
    );

    /**
     * Returns true if the User-Agent string matches a known AI crawler.
     * String.contains() — literal substring match, no regex.
     * Null-safe: returns false for null or empty input.
     *
     * @param userAgent the raw User-Agent header value (may be null)
     * @return true if the request is from a known AI bot
     */
    public static boolean isAiBot(String userAgent) {
        if (userAgent == null || userAgent.isEmpty()) return false;
        final String lower = userAgent.toLowerCase();
        return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
    }
}

2. Global before() filter

Register the filter before any route definitions. request.headers("User-Agent") returns null when the header is absent — the isAiBot() helper handles this. Set response headers before calling halt() because the response is committed when HaltException is thrown.

// App.java — SparkJava with global before() bot-blocking filter
import static spark.Spark.*;

public class App {

    public static void main(String[] args) {

        port(8080);

        // before(filter) — runs for EVERY request before any route handler.
        // Registered before route definitions so it fires first.
        before((request, response) -> {
            // Allow robots.txt so bots can discover Disallow rules.
            if ("/robots.txt".equals(request.pathInfo())) {
                return; // plain return = pass through, do not block
            }

            // request.headers("User-Agent") — case-insensitive (wraps Jetty
            // HttpServletRequest.getHeader). Returns null when absent.
            String ua = request.headers("User-Agent");

            if (AiBotDetector.isAiBot(ua)) {
                // Set headers BEFORE halt() — response is committed on throw.
                response.header("X-Robots-Tag", "noai, noimageai");
                response.type("text/plain");

                // halt(statusCode, body) throws HaltException immediately.
                // Spark catches it and sends the response.
                // Code after halt() is UNREACHABLE — no return needed.
                halt(403, "Forbidden");
            }

            // Pass: inject X-Robots-Tag on the way through, then continue.
            response.header("X-Robots-Tag", "noai, noimageai");
            // Plain return — Spark continues to the route handler.
        });

        // robots.txt — reachable by all crawlers.
        get("/robots.txt", (request, response) -> {
            response.type("text/plain");
            return """
                User-agent: *
                Allow: /

                User-agent: GPTBot
                Disallow: /

                User-agent: ClaudeBot
                Disallow: /

                User-agent: CCBot
                Disallow: /

                User-agent: Google-Extended
                Disallow: /
                """;
        });

        get("/", (request, response) -> {
            response.type("application/json");
            return "{\"message\": \"Hello\"}";
        });

        get("/api/data", (request, response) -> {
            response.type("application/json");
            return "{\"data\": \"value\"}";
        });
    }
}

3. How halt() works

halt() throws a HaltException — it does not return. Spark's filter runner catches this exception, writes the status and body, and skips all remaining filters and the route handler. This means any statement after halt() in the same lambda is dead code.

// halt() internals — what happens under the hood.

// halt(int status, String body) is equivalent to:
//   throw new HaltException(status, body);
//
// Spark's filter execution loop catches HaltException and:
//   1. Sets the HTTP status code on the response.
//   2. Writes the body string to the response.
//   3. Commits the response (no further writes possible).
//   4. Skips all remaining filters and the route handler.
//
// Because halt() throws, the JVM unwinds the stack immediately.
// Any statement after halt() in the same lambda is unreachable:

before((request, response) -> {
    if (AiBotDetector.isAiBot(request.headers("User-Agent"))) {
        response.header("X-Robots-Tag", "noai, noimageai");
        halt(403, "Forbidden");
        // ← Everything below is dead code. The compiler may warn.
        response.type("text/plain"); // NEVER executes
        System.out.println("blocked"); // NEVER executes
    }
});

// Contrast with plain return — pass through:
before((request, response) -> {
    if ("/public".equals(request.pathInfo())) {
        return; // exits the lambda, Spark continues to next filter/route
    }
    // ... bot check
});

4. Path-scoped before(path, filter)

before(path, filter) restricts the filter to requests whose path matches the pattern. SparkJava supports * wildcard globs — "/api/*" matches /api/data, /api/users, and any other /api/ subpath.

// Path-scoped filter — protect /api/* only.
// before(path, filter) — path supports * wildcard globs.
// The global before() guard (above) remains for full-site protection;
// this shows the scoped variant independently.

before("/api/*", (request, response) -> {
    String ua = request.headers("User-Agent");
    if (AiBotDetector.isAiBot(ua)) {
        response.header("X-Robots-Tag", "noai, noimageai");
        halt(403, "Forbidden");
    }
    response.header("X-Robots-Tag", "noai, noimageai");
});

// Public routes — not covered by the /api/* filter.
get("/", (request, response) -> "public");
get("/blog", (request, response) -> "public blog");

// Protected routes — before("/api/*", ...) fires for these.
get("/api/data", (request, response) -> "protected");
get("/api/users", (request, response) -> "protected");

Key points

Framework comparison — Java web frameworks

FrameworkFilter registrationBlock requestHeader access
SparkJavabefore((req, res) -> …)halt(403, "Forbidden") (throws)req.headers("User-Agent") case-insensitive
Javalinapp.before(ctx -> …)ctx.status(403).result("Forbidden") + skipctx.header("User-Agent") case-insensitive
Spring BootOncePerRequestFilter beanresponse.sendError(403) or response.setStatus(403)request.getHeader("User-Agent") case-insensitive
Quarkus@ServerRequestFilter or Vert.x router.route().handler()requestContext.abortWith(Response.status(403).build())headers.getHeaderString("User-Agent") case-insensitive

SparkJava's halt() is unique among these frameworks — it uses a thrown exception to abort the filter chain rather than a return value or context flag. This makes the block absolute: no downstream code can accidentally run after a halt() call. Javalin (SparkJava's spiritual successor) uses a return-value approach instead, which can be more testable. Spring Boot and Quarkus use the Servlet/JAX-RS filter model, both of which are request-scoped with explicit chain.doFilter() / abortWith() semantics.