How to Block AI Bots in Java Dropwizard
Dropwizard is a production-ready Java framework that bundles Jetty (HTTP server), Jersey (JAX-RS), Jackson (JSON), and Metrics into a single deployable JAR. It is widely used for microservices and REST APIs in enterprise Java environments. Dropwizard's HTTP pipeline runs Jersey on top of Jetty — bot blocking can happen at either layer. The idiomatic approach is a Jersey ContainerRequestFilter annotated with @PreMatching, which fires before route matching for every request. The Dropwizard-specific detail: getHeaderString() returns null when the header is absent (not an empty string), and abortWith() short-circuits the filter chain but does not trigger ContainerResponseFilter — so you must set X-Robots-Tag on blocked responses inside the request filter itself.
1. Bot detection class
A static utility class with no external dependencies. String.contains() performs literal substring matching. toLowerCase(Locale.ROOT) is used instead of toLowerCase() — locale-independent, avoids the Turkish locale I → ı edge case.
// src/main/java/com/example/filter/AiBotDetector.java
package com.example.filter;
import java.util.List;
import java.util.Locale;
public final class AiBotDetector {
// All lowercase — matched against ua.toLowerCase(Locale.ROOT)
private static final List<String> AI_BOT_PATTERNS = List.of(
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft"
);
private AiBotDetector() {}
public static boolean isAiBot(String ua) {
if (ua == null || ua.isEmpty()) return false;
String lower = ua.toLowerCase(Locale.ROOT);
// String.contains() — literal substring, no regex overhead
return AI_BOT_PATTERNS.stream().anyMatch(lower::contains);
}
}2. ContainerRequestFilter — @PreMatching
@PreMatching is the critical annotation — without it, Jersey only runs the filter for requests that match a registered resource method, and 404 paths bypass it entirely. @PreMatching fires for all requests before any routing. getHeaderString() returns null when absent — always null-check. abortWith() sends the response and stops the chain; return immediately after.
// src/main/java/com/example/filter/AiBotRequestFilter.java
package com.example.filter;
import jakarta.annotation.Priority;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerRequestFilter;
import jakarta.ws.rs.container.PreMatching;
import jakarta.ws.rs.core.HttpHeaders;
import jakarta.ws.rs.core.Response;
import jakarta.ws.rs.ext.Provider;
import java.io.IOException;
@Provider // Auto-discovered by Jersey when scanning the package
@PreMatching // Runs BEFORE route matching — fires for all requests,
// including those that would 404. More efficient: Jersey
// does no routing work for blocked requests.
@Priority(Priorities.AUTHENTICATION - 100) // Run before authentication filters
public class AiBotRequestFilter implements ContainerRequestFilter {
private static final String X_ROBOTS_TAG = "X-Robots-Tag";
private static final String ROBOTS_VALUE = "noai, noimageai";
@Override
public void filter(ContainerRequestContext requestContext) throws IOException {
// Path guard: let robots.txt through.
// Dropwizard typically serves static assets from a separate path;
// this guard handles the edge case where robots.txt is a dynamic resource.
String path = requestContext.getUriInfo().getPath();
if ("robots.txt".equals(path) || "/robots.txt".equals(path)) {
return;
}
// getHeaderString() returns null when the header is absent.
// HttpHeaders.USER_AGENT = "User-Agent" (JAX-RS constant)
String ua = requestContext.getHeaderString(HttpHeaders.USER_AGENT);
if (ua == null) ua = "";
if (AiBotDetector.isAiBot(ua)) {
Response blocked = Response
.status(Response.Status.FORBIDDEN)
.entity("Forbidden")
.header(X_ROBOTS_TAG, ROBOTS_VALUE)
.type("text/plain")
.build();
// abortWith() sends this response and stops the filter chain.
// Return immediately after — any code after this does not run.
requestContext.abortWith(blocked);
return;
}
// Pass-through: no action needed here.
// X-Robots-Tag on passing responses is added by AiBotResponseFilter.
}
}3. ContainerResponseFilter — X-Robots-Tag on passing responses
ContainerResponseFilter fires for all responses that were not aborted. Aborted responses (from abortWith()) bypass this filter in Jersey — that is why the X-Robots-Tag header is set inside the request filter for blocked responses.
// src/main/java/com/example/filter/AiBotResponseFilter.java
package com.example.filter;
import jakarta.annotation.Priority;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerResponseContext;
import jakarta.ws.rs.container.ContainerResponseFilter;
import jakarta.ws.rs.ext.Provider;
import java.io.IOException;
@Provider
@Priority(Priorities.HEADER_DECORATOR)
public class AiBotResponseFilter implements ContainerResponseFilter {
@Override
public void filter(
ContainerRequestContext requestContext,
ContainerResponseContext responseContext
) throws IOException {
// Fires for all responses that were NOT aborted.
// abortWith() responses bypass ContainerResponseFilter in Jersey —
// the X-Robots-Tag on blocked responses is set inside AiBotRequestFilter.
responseContext.getHeaders().add("X-Robots-Tag", "noai, noimageai");
}
}4. Application class — register filters
Register filters via environment.jersey().register() in the run() method. Explicit registration is preferred over classpath scanning (@Provider auto-discovery) in Dropwizard — it keeps the dependency graph visible and avoids surprises from classpath scanning.
// src/main/java/com/example/MyApplication.java
package com.example;
import com.example.filter.AiBotRequestFilter;
import com.example.filter.AiBotResponseFilter;
import com.example.resources.ApiResource;
import io.dropwizard.core.Application;
import io.dropwizard.core.setup.Bootstrap;
import io.dropwizard.core.setup.Environment;
public class MyApplication extends Application<MyConfiguration> {
public static void main(String[] args) throws Exception {
new MyApplication().run(args);
}
@Override
public void initialize(Bootstrap<MyConfiguration> bootstrap) {
// Initialisation — add bundles, commands here
}
@Override
public void run(MyConfiguration configuration, Environment environment) {
// Register Jersey filters — explicit registration is preferred over
// @Provider auto-scanning in Dropwizard to keep dependencies clear.
environment.jersey().register(new AiBotRequestFilter());
environment.jersey().register(new AiBotResponseFilter());
// Register resources
environment.jersey().register(new ApiResource());
}
}5. Resource class
Standard JAX-RS resource. The filter fires before this class is instantiated for any request — blocked requests never reach the resource.
// src/main/java/com/example/resources/ApiResource.java
package com.example.resources;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import java.util.Map;
@Path("/api")
@Produces(MediaType.APPLICATION_JSON)
public class ApiResource {
@GET
@Path("/data")
public Map<String, String> getData() {
return Map.of("data", "value");
}
@GET
@Path("/status")
public Map<String, String> getStatus() {
return Map.of("status", "ok");
}
}6. Jetty Servlet filter — block at the lowest level
For maximum efficiency, block at the Jetty servlet layer before Jersey runs. This is more code but avoids all Jersey overhead for blocked requests. Register via environment.servlets().addFilter(). Use httpReq.getHeader() — the raw servlet API, not Jersey's getHeaderString().
// Alternative: Jetty ServletFilter — runs at the Servlet layer,
// BEFORE Jersey processes the request. More efficient; no Jersey context.
// Use this when you want to block at the lowest possible level.
// src/main/java/com/example/filter/AiBotServletFilter.java
package com.example.filter;
import jakarta.servlet.*;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import java.io.IOException;
public class AiBotServletFilter implements Filter {
@Override
public void doFilter(
ServletRequest request,
ServletResponse response,
FilterChain chain
) throws IOException, ServletException {
HttpServletRequest httpReq = (HttpServletRequest) request;
HttpServletResponse httpResp = (HttpServletResponse) response;
String path = httpReq.getRequestURI();
if ("/robots.txt".equals(path)) {
chain.doFilter(request, response);
return;
}
String ua = httpReq.getHeader("User-Agent");
if (ua == null) ua = "";
if (AiBotDetector.isAiBot(ua)) {
httpResp.setStatus(HttpServletResponse.SC_FORBIDDEN);
httpResp.setHeader("X-Robots-Tag", "noai, noimageai");
httpResp.setContentType("text/plain");
httpResp.getWriter().write("Forbidden");
return; // Do NOT call chain.doFilter() — stops the chain
}
// Pass through
chain.doFilter(request, response);
// Note: response headers must be set before chain.doFilter() completes
// in a servlet filter — setting them after is too late in most containers.
}
@Override public void init(FilterConfig filterConfig) {}
@Override public void destroy() {}
}
// Register in Application.run():
// environment.servlets()
// .addFilter("AiBotFilter", new AiBotServletFilter())
// .addMappingForUrlPatterns(EnumSet.of(DispatcherType.REQUEST), true, "/*");7. robots.txt — dynamic resource
Dropwizard does not have a built-in static file handler for /robots.txt at the application root. The cleanest options are: serve it from Nginx upstream, or add a dedicated JAX-RS resource. The path guard in AiBotRequestFilter ensures the filter passes it through when served by Dropwizard.
// Dynamic robots.txt resource — serves robots.txt at /robots.txt
// Use this when there is no Nginx upstream to serve it as a static file.
@Path("/robots.txt")
@Produces("text/plain")
public class RobotsResource {
private static final String ROBOTS_CONTENT = """
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
""";
@GET
public String getRobots() {
return ROBOTS_CONTENT;
}
}
// The AiBotRequestFilter path guard lets /robots.txt through:
// if ("robots.txt".equals(path)) return;8. config.yml
# config.yml — Dropwizard configuration
server:
type: simple
applicationContextPath: /
connector:
type: http
port: 8080
# Serve static assets (including robots.txt) from the admin path or a
# separate static asset bundle. For robots.txt at /, use a dynamic resource
# or place it in the web server (Nginx) upstream of Dropwizard.
logging:
level: INFO
loggers:
com.example: DEBUGKey points
- @PreMatching is required for global coverage: Without
@PreMatching, Jersey only invokes the filter for requests that match a registered resource method. Requests for unknown paths (404) bypass it.@PreMatchingfires for every request before any routing occurs. - getHeaderString() returns null, not empty string: Unlike many other frameworks that return an empty string for missing headers, Jersey's
getHeaderString()returnsnull. Always null-check before calling.toLowerCase()or.contains(). - abortWith() bypasses ContainerResponseFilter: Jersey does not invoke response filters for aborted requests. Set
X-Robots-Tagdirectly on the abort response using.header()in theResponsebuilder — not in a separate response filter. - Return after abortWith(): Unlike some frameworks where the abort/halt/error call throws, Jersey's
abortWith()does not throw. Code after the call continues to run unless youreturnexplicitly. Alwaysreturnimmediately afterabortWith(). - Locale.ROOT for toLowerCase(): Use
ua.toLowerCase(Locale.ROOT), notua.toLowerCase(). The system-default locale can produce unexpected results in Turkish locales where"I".toLowerCase()returns"ı"(dotless i) instead of"i". - Jersey vs Jetty layer: The Jersey
ContainerRequestFilteris idiomatic for Dropwizard and has access to JAX-RS context. The Jetty servletFilterruns earlier and is more efficient for blocking — choose based on whether you need JAX-RS context inside the filter.
Framework comparison — Java REST frameworks
| Framework | Filter / interceptor | Block call | UA header |
|---|---|---|---|
| Dropwizard (Jersey) | ContainerRequestFilter @PreMatching | ctx.abortWith(Response.status(403).build()) | ctx.getHeaderString(HttpHeaders.USER_AGENT) |
| Spring Boot | HandlerInterceptor or OncePerRequestFilter | response.setStatus(403); return false | request.getHeader("User-Agent") |
| Micronaut | HttpServerFilter | return HttpResponse.status(FORBIDDEN) | request.getHeaders().get("User-Agent", "") |
| Quarkus (RESTEasy) | ContainerRequestFilter @PreMatching | ctx.abortWith(Response.status(403).build()) | ctx.getHeaderString("User-Agent") |
Dropwizard and Quarkus (RESTEasy) share the same JAX-RS ContainerRequestFilter API — code is largely portable between them. The key difference is that Quarkus uses RESTEasy Reactive by default (Quarkus 2+), which has a slightly different threading model. Spring Boot's HandlerInterceptor is MVC-specific and fires later than a servlet filter, while Dropwizard's @PreMatching filter fires at the Jersey layer before any MVC processing.
Dependencies
<!-- pom.xml -->
<dependency>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-core</artifactId>
<version>4.0.7</version>
</dependency>
<!-- Build and run -->
mvn clean package
java -jar target/myapp-1.0-SNAPSHOT.jar server config.yml
<!-- Dropwizard bundles (no extra deps needed for filtering):
- Jetty 12 (HTTP server)
- Jersey 3.x (JAX-RS 3.x / Jakarta EE 10)
- Jackson 2.x (JSON)
- Metrics 4.x
All shipped in the fat JAR via dropwizard-core. -->