Skip to content

How to Block AI Bots in Java Dropwizard

Dropwizard is a production-ready Java framework that bundles Jetty (HTTP server), Jersey (JAX-RS), Jackson (JSON), and Metrics into a single deployable JAR. It is widely used for microservices and REST APIs in enterprise Java environments. Dropwizard's HTTP pipeline runs Jersey on top of Jetty — bot blocking can happen at either layer. The idiomatic approach is a Jersey ContainerRequestFilter annotated with @PreMatching, which fires before route matching for every request. The Dropwizard-specific detail: getHeaderString() returns null when the header is absent (not an empty string), and abortWith() short-circuits the filter chain but does not trigger ContainerResponseFilter — so you must set X-Robots-Tag on blocked responses inside the request filter itself.

1. Bot detection class

A static utility class with no external dependencies. String.contains() performs literal substring matching. toLowerCase(Locale.ROOT) is used instead of toLowerCase() — locale-independent, avoids the Turkish locale Iı edge case.

// src/main/java/com/example/filter/AiBotDetector.java
package com.example.filter;

import java.util.List;
import java.util.Locale;

public final class AiBotDetector {

    // All lowercase — matched against ua.toLowerCase(Locale.ROOT)
    private static final List<String> AI_BOT_PATTERNS = List.of(
        "gptbot",
        "chatgpt-user",
        "claudebot",
        "anthropic-ai",
        "ccbot",
        "google-extended",
        "cohere-ai",
        "meta-externalagent",
        "bytespider",
        "omgili",
        "diffbot",
        "imagesiftbot",
        "magpie-crawler",
        "amazonbot",
        "dataprovider",
        "netcraft"
    );

    private AiBotDetector() {}

    public static boolean isAiBot(String ua) {
        if (ua == null || ua.isEmpty()) return false;
        String lower = ua.toLowerCase(Locale.ROOT);
        // String.contains() — literal substring, no regex overhead
        return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
    }
}

2. ContainerRequestFilter — @PreMatching

@PreMatching is the critical annotation — without it, Jersey only runs the filter for requests that match a registered resource method, and 404 paths bypass it entirely. @PreMatching fires for all requests before any routing. getHeaderString() returns null when absent — always null-check. abortWith() sends the response and stops the chain; return immediately after.

// src/main/java/com/example/filter/AiBotRequestFilter.java
package com.example.filter;

import jakarta.annotation.Priority;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerRequestFilter;
import jakarta.ws.rs.container.PreMatching;
import jakarta.ws.rs.core.HttpHeaders;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.ext.Provider;
import java.io.IOException;

@Provider           // Auto-discovered by Jersey when scanning the package
@PreMatching        // Runs BEFORE route matching — fires for all requests,
                    // including those that would 404. More efficient: Jersey
                    // does no routing work for blocked requests.
@Priority(Priorities.AUTHENTICATION - 100)  // Run before authentication filters
public class AiBotRequestFilter implements ContainerRequestFilter {

    private static final String X_ROBOTS_TAG = "X-Robots-Tag";
    private static final String ROBOTS_VALUE = "noai, noimageai";

    @Override
    public void filter(ContainerRequestContext requestContext) throws IOException {
        // Path guard: let robots.txt through.
        // Dropwizard typically serves static assets from a separate path;
        // this guard handles the edge case where robots.txt is a dynamic resource.
        String path = requestContext.getUriInfo().getPath();
        if ("robots.txt".equals(path) || "/robots.txt".equals(path)) {
            return;
        }

        // getHeaderString() returns null when the header is absent.
        // HttpHeaders.USER_AGENT = "User-Agent" (JAX-RS constant)
        String ua = requestContext.getHeaderString(HttpHeaders.USER_AGENT);
        if (ua == null) ua = "";

        if (AiBotDetector.isAiBot(ua)) {
            Response blocked = Response
                .status(Response.Status.FORBIDDEN)
                .entity("Forbidden")
                .header(X_ROBOTS_TAG, ROBOTS_VALUE)
                .type("text/plain")
                .build();
            // abortWith() sends this response and stops the filter chain.
            // Return immediately after — any code after this does not run.
            requestContext.abortWith(blocked);
            return;
        }
        // Pass-through: no action needed here.
        // X-Robots-Tag on passing responses is added by AiBotResponseFilter.
    }
}

3. ContainerResponseFilter — X-Robots-Tag on passing responses

ContainerResponseFilter fires for all responses that were not aborted. Aborted responses (from abortWith()) bypass this filter in Jersey — that is why the X-Robots-Tag header is set inside the request filter for blocked responses.

// src/main/java/com/example/filter/AiBotResponseFilter.java
package com.example.filter;

import jakarta.annotation.Priority;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerResponseContext;
import jakarta.ws.rs.container.ContainerResponseFilter;
import jakarta.ws.rs.ext.Provider;
import java.io.IOException;

@Provider
@Priority(Priorities.HEADER_DECORATOR)
public class AiBotResponseFilter implements ContainerResponseFilter {

    @Override
    public void filter(
        ContainerRequestContext  requestContext,
        ContainerResponseContext responseContext
    ) throws IOException {
        // Fires for all responses that were NOT aborted.
        // abortWith() responses bypass ContainerResponseFilter in Jersey —
        // the X-Robots-Tag on blocked responses is set inside AiBotRequestFilter.
        responseContext.getHeaders().add("X-Robots-Tag", "noai, noimageai");
    }
}

4. Application class — register filters

Register filters via environment.jersey().register() in the run() method. Explicit registration is preferred over classpath scanning (@Provider auto-discovery) in Dropwizard — it keeps the dependency graph visible and avoids surprises from classpath scanning.

// src/main/java/com/example/MyApplication.java
package com.example;

import com.example.filter.AiBotRequestFilter;
import com.example.filter.AiBotResponseFilter;
import com.example.resources.ApiResource;
import io.dropwizard.core.Application;
import io.dropwizard.core.setup.Bootstrap;
import io.dropwizard.core.setup.Environment;

public class MyApplication extends Application<MyConfiguration> {

    public static void main(String[] args) throws Exception {
        new MyApplication().run(args);
    }

    @Override
    public void initialize(Bootstrap<MyConfiguration> bootstrap) {
        // Initialisation — add bundles, commands here
    }

    @Override
    public void run(MyConfiguration configuration, Environment environment) {
        // Register Jersey filters — explicit registration is preferred over
        // @Provider auto-scanning in Dropwizard to keep dependencies clear.
        environment.jersey().register(new AiBotRequestFilter());
        environment.jersey().register(new AiBotResponseFilter());

        // Register resources
        environment.jersey().register(new ApiResource());
    }
}

5. Resource class

Standard JAX-RS resource. The filter fires before this class is instantiated for any request — blocked requests never reach the resource.

// src/main/java/com/example/resources/ApiResource.java
package com.example.resources;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import java.util.Map;

@Path("/api")
@Produces(MediaType.APPLICATION_JSON)
public class ApiResource {

    @GET
    @Path("/data")
    public Map<String, String> getData() {
        return Map.of("data", "value");
    }

    @GET
    @Path("/status")
    public Map<String, String> getStatus() {
        return Map.of("status", "ok");
    }
}

6. Jetty Servlet filter — block at the lowest level

For maximum efficiency, block at the Jetty servlet layer before Jersey runs. This is more code but avoids all Jersey overhead for blocked requests. Register via environment.servlets().addFilter(). Use httpReq.getHeader() — the raw servlet API, not Jersey's getHeaderString().

// Alternative: Jetty ServletFilter — runs at the Servlet layer,
// BEFORE Jersey processes the request. More efficient; no Jersey context.
// Use this when you want to block at the lowest possible level.

// src/main/java/com/example/filter/AiBotServletFilter.java
package com.example.filter;

import jakarta.servlet.*;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import java.io.IOException;

public class AiBotServletFilter implements Filter {

    @Override
    public void doFilter(
        ServletRequest request,
        ServletResponse response,
        FilterChain chain
    ) throws IOException, ServletException {
        HttpServletRequest  httpReq  = (HttpServletRequest)  request;
        HttpServletResponse httpResp = (HttpServletResponse) response;

        String path = httpReq.getRequestURI();
        if ("/robots.txt".equals(path)) {
            chain.doFilter(request, response);
            return;
        }

        String ua = httpReq.getHeader("User-Agent");
        if (ua == null) ua = "";

        if (AiBotDetector.isAiBot(ua)) {
            httpResp.setStatus(HttpServletResponse.SC_FORBIDDEN);
            httpResp.setHeader("X-Robots-Tag", "noai, noimageai");
            httpResp.setContentType("text/plain");
            httpResp.getWriter().write("Forbidden");
            return;  // Do NOT call chain.doFilter() — stops the chain
        }

        // Pass through
        chain.doFilter(request, response);
        // Note: response headers must be set before chain.doFilter() completes
        // in a servlet filter — setting them after is too late in most containers.
    }

    @Override public void init(FilterConfig filterConfig) {}
    @Override public void destroy() {}
}

// Register in Application.run():
// environment.servlets()
//     .addFilter("AiBotFilter", new AiBotServletFilter())
//     .addMappingForUrlPatterns(EnumSet.of(DispatcherType.REQUEST), true, "/*");

7. robots.txt — dynamic resource

Dropwizard does not have a built-in static file handler for /robots.txt at the application root. The cleanest options are: serve it from Nginx upstream, or add a dedicated JAX-RS resource. The path guard in AiBotRequestFilter ensures the filter passes it through when served by Dropwizard.

// Dynamic robots.txt resource — serves robots.txt at /robots.txt
// Use this when there is no Nginx upstream to serve it as a static file.

@Path("/robots.txt")
@Produces("text/plain")
public class RobotsResource {

    private static final String ROBOTS_CONTENT = """
        User-agent: *
        Allow: /

        User-agent: GPTBot
        Disallow: /

        User-agent: ClaudeBot
        Disallow: /

        User-agent: CCBot
        Disallow: /

        User-agent: Google-Extended
        Disallow: /
        """;

    @GET
    public String getRobots() {
        return ROBOTS_CONTENT;
    }
}

// The AiBotRequestFilter path guard lets /robots.txt through:
// if ("robots.txt".equals(path)) return;

8. config.yml

# config.yml — Dropwizard configuration
server:
  type: simple
  applicationContextPath: /
  connector:
    type: http
    port: 8080

  # Serve static assets (including robots.txt) from the admin path or a
  # separate static asset bundle. For robots.txt at /, use a dynamic resource
  # or place it in the web server (Nginx) upstream of Dropwizard.

logging:
  level: INFO
  loggers:
    com.example: DEBUG

Key points

Framework comparison — Java REST frameworks

FrameworkFilter / interceptorBlock callUA header
Dropwizard (Jersey)ContainerRequestFilter @PreMatchingctx.abortWith(Response.status(403).build())ctx.getHeaderString(HttpHeaders.USER_AGENT)
Spring BootHandlerInterceptor or OncePerRequestFilterresponse.setStatus(403); return falserequest.getHeader("User-Agent")
MicronautHttpServerFilterreturn HttpResponse.status(FORBIDDEN)request.getHeaders().get("User-Agent", "")
Quarkus (RESTEasy)ContainerRequestFilter @PreMatchingctx.abortWith(Response.status(403).build())ctx.getHeaderString("User-Agent")

Dropwizard and Quarkus (RESTEasy) share the same JAX-RS ContainerRequestFilter API — code is largely portable between them. The key difference is that Quarkus uses RESTEasy Reactive by default (Quarkus 2+), which has a slightly different threading model. Spring Boot's HandlerInterceptor is MVC-specific and fires later than a servlet filter, while Dropwizard's @PreMatching filter fires at the Jersey layer before any MVC processing.

Dependencies

<!-- pom.xml -->
<dependency>
  <groupId>io.dropwizard</groupId>
  <artifactId>dropwizard-core</artifactId>
  <version>4.0.7</version>
</dependency>

<!-- Build and run -->
mvn clean package
java -jar target/myapp-1.0-SNAPSHOT.jar server config.yml

<!-- Dropwizard bundles (no extra deps needed for filtering):
  - Jetty 12 (HTTP server)
  - Jersey 3.x (JAX-RS 3.x / Jakarta EE 10)
  - Jackson 2.x (JSON)
  - Metrics 4.x
  All shipped in the fat JAR via dropwizard-core. -->