How to Block AI Bots with Java Servlet: Complete 2026 Guide
The Java Servlet API is the foundational HTTP layer beneath Spring Boot, Spring MVC, Jakarta EE, Tomcat, Jetty, and WildFly. Bot blocking at this level uses a Filter — a class that intercepts every request before your servlets or controllers execute. The same filter works on any servlet container, regardless of framework.
javax.servlet vs jakarta.servlet
Jakarta EE 9 (2020) renamed the core packages from javax.servlet to jakarta.servlet. Tomcat 9 and earlier use javax.servlet. Tomcat 10+ use jakarta.servlet. The API is identical — only the import path differs. Deploying a WAR built against the wrong namespace causes ClassNotFoundException at startup.
javax.servlet.* — Tomcat 9, JBoss EAP 7, Jetty 9–11jakarta.servlet.* — Tomcat 10+, WildFly 23+, Payara 6+Four protection layers
Layer 1: robots.txt
Place robots.txt in your web application root. Servlet containers serve static files from this directory before any filter or servlet executes — no controller or configuration needed.
WAR / Maven project
In a standard Maven WAR project, the web root is src/main/webapp/. Files here are copied to the root of the WAR and served directly by the container at /robots.txt.
# src/main/webapp/robots.txt User-agent: * Allow: / User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: CCBot User-agent: Bytespider User-agent: Applebot-Extended User-agent: PerplexityBot User-agent: Diffbot User-agent: cohere-ai User-agent: FacebookBot User-agent: omgili User-agent: omgilibot User-agent: Amazonbot Disallow: /
Spring Boot (embedded Tomcat)
Spring Boot serves static files from src/main/resources/static/. Place robots.txt there — it becomes available at /robots.txt with no additional configuration.
Layer 2: noai meta tag
Add <meta name="robots" content="noai, noimageai" /> to your base JSP layout. This signals to compliant AI crawlers not to use your content for training.
Base layout fragment (JSP)
<%-- WEB-INF/layout/header.jsp --%>
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title><c:out value="${pageTitle}" default="My App" /></title>
<%-- AI bot training opt-out. Per-page override: set request attribute "robots" --%>
<meta name="robots"
content="${not empty robots ? robots : 'noai, noimageai'}" />
</head>Per-page override
Set a robots request attribute in your servlet before forwarding to the JSP. The JSTL expression reads it and falls back to noai, noimageai when not set.
// Pages that should be indexed normally (e.g. a public about page):
request.setAttribute("robots", "index, follow");
request.getRequestDispatcher("/WEB-INF/views/about.jsp")
.forward(request, response);Include the layout in each JSP
<%-- WEB-INF/views/index.jsp --%> <%@ include file="/WEB-INF/layout/header.jsp" %> <body> <h1>Welcome</h1> </body> </html>
Layers 3 & 4: Servlet Filter
A single Filter handles both the X-Robots-Tag header and the hard 403 block. Filters intercept every request before any servlet or controller executes — the correct layer for bot blocking.
AiBotFilter — Jakarta EE 9+ (jakarta.servlet)
Use this for Tomcat 10+, WildFly 23+, Payara 6+, GlassFish 6+. For Tomcat 9 and earlier, replace every jakarta.servlet import with javax.servlet.
// src/main/java/com/example/filter/AiBotFilter.java
package com.example.filter;
import jakarta.servlet.Filter;
import jakarta.servlet.FilterChain;
import jakarta.servlet.FilterConfig;
import jakarta.servlet.ServletException;
import jakarta.servlet.ServletRequest;
import jakarta.servlet.ServletResponse;
import jakarta.servlet.annotation.WebFilter;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import java.io.IOException;
import java.util.Set;
import java.util.regex.Pattern;
@WebFilter("/*")
public class AiBotFilter implements Filter {
private static final Pattern AI_BOTS = Pattern.compile(
"(?i)gptbot|claudebot|anthropic-ai|google-extended|ccbot" +
"|bytespider|applebot-extended|perplexitybot|diffbot" +
"|cohere-ai|facebookbot|omgili|omgilibot|amazonbot" +
"|deepseekbot|mistralbot|xai-bot|ai2-bot"
);
// These paths must remain accessible — crawlers need robots.txt
private static final Set<String> EXEMPT_PATHS = Set.of(
"/robots.txt", "/sitemap.xml", "/favicon.ico"
);
@Override
public void init(FilterConfig config) {}
@Override
public void doFilter(
ServletRequest req, ServletResponse res, FilterChain chain
) throws IOException, ServletException {
HttpServletRequest request = (HttpServletRequest) req;
HttpServletResponse response = (HttpServletResponse) res;
String path = request.getRequestURI()
.substring(request.getContextPath().length());
// Always allow exempt paths — never block robots.txt
if (EXEMPT_PATHS.contains(path)) {
chain.doFilter(req, res);
return;
}
String ua = request.getHeader("User-Agent");
// Layer 4: Hard 403 — block before any servlet runs
if (ua != null && AI_BOTS.matcher(ua).find()) {
response.sendError(HttpServletResponse.SC_FORBIDDEN, "Forbidden");
return; // Do NOT call chain.doFilter() after blocking
}
// Pass legitimate requests through
chain.doFilter(req, res);
// Layer 3: X-Robots-Tag — applied to all served responses
response.setHeader("X-Robots-Tag", "noai, noimageai");
}
@Override
public void destroy() {}
}Key points
@WebFilter("/*")— registers this filter for all URLs. Requires Servlet 3.0+ (Tomcat 7+). Spring Boot users also need@ServletComponentScanon the main class.response.sendError(403)+ return — sends the error response and halts the filter chain. Omittingreturnwould continue processing after the error, which is incorrect.- EXEMPT_PATHS check comes before the UA check. Without it, robots.txt returns 403 and breaks the bot-blocking protocol.
- X-Robots-Tag is set after
chain.doFilter()— only on responses that were served normally, never on 403 bot blocks. request.getHeader("User-Agent")is case-insensitive per the Servlet specification — no manual lowercasing needed.
Alternative: web.xml registration
If you need explicit ordering or Servlet 2.x compatibility, register the filter in web.xml instead of using the annotation. Remove the @WebFilter annotation from the class first.
<!-- src/main/webapp/WEB-INF/web.xml -->
<web-app xmlns="https://jakarta.ee/xml/ns/jakartaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://jakarta.ee/xml/ns/jakartaee
https://jakarta.ee/xml/ns/jakartaee/web-app_6_0.xsd"
version="6.0">
<filter>
<filter-name>AiBotFilter</filter-name>
<filter-class>com.example.filter.AiBotFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>AiBotFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
</web-app>For Tomcat 9 / Java EE 8, use the http://xmlns.jcp.org/xml/ns/javaee namespace and version="4.0". Multiple <filter-mapping> entries execute in declaration order — place the bot filter first to block before any other filter processes the request.
Maven dependencies
The Servlet API is provided by the container at runtime — always use scope=provided so it is not bundled in your WAR.
Tomcat 10+ / Jakarta EE 9+
<dependency> <groupId>jakarta.servlet</groupId> <artifactId>jakarta.servlet-api</artifactId> <version>6.0.0</version> <scope>provided</scope> </dependency>
Tomcat 9 / Java EE 8 and earlier
<dependency> <groupId>javax.servlet</groupId> <artifactId>javax.servlet-api</artifactId> <version>4.0.1</version> <scope>provided</scope> </dependency>
Deployment
Build a WAR and deploy to any servlet container:
# Build mvn clean package -DskipTests # Deploy to Tomcat cp target/myapp.war /opt/tomcat/webapps/ROOT.war # Tomcat auto-deploys on file change; or restart: /opt/tomcat/bin/shutdown.sh && /opt/tomcat/bin/startup.sh
Container support: Apache Tomcat (7–11), Eclipse Jetty (9–12), WildFly / JBoss EAP (16+), Payara (5+), GlassFish (5+), and any Jakarta EE application server. Spring Boot users: register the filter as a @Bean of type Filter — Spring Boot wraps it in a FilterRegistrationBean automatically, or add @ServletComponentScan to enable @WebFilter discovery.
FAQ
What is the difference between javax.servlet and jakarta.servlet?
javax.servlet is the Java EE namespace used in Tomcat 9 and earlier. jakarta.servlet is the Jakarta EE 9+ namespace used in Tomcat 10+, WildFly 23+, and modern Jakarta EE servers. The API is identical — only the import path changed. Deploying a WAR built against javax.servlet to Tomcat 10+ causes ClassNotFoundException at runtime. Match the dependency to your container version.
Should I use @WebFilter or web.xml?
@WebFilter is simpler — annotate the class and the container discovers it automatically. It requires Servlet 3.0+ (Tomcat 7+). web.xml gives you explicit filter ordering and works on all versions including Servlet 2.x. For a single bot-blocking filter, @WebFilter is sufficient. For multiple filters where execution order matters, prefer web.xml.
Does the filter run before the container serves robots.txt?
Yes — @WebFilter("/*") intercepts all requests, including static file requests. The EXEMPT_PATHS check at the top of doFilter() is essential. Without it, /robots.txt returns 403, which prevents crawlers from reading your disallow rules and breaks the robots.txt protocol.
How do I add noai meta tags in a Spring MVC or Thymeleaf app?
Add the meta tag to your Thymeleaf base layout: <meta name="robots" th:content="${robots ?: 'noai, noimageai'}">. In your controller, use model.addAttribute("robots", "index, follow") for pages that should be indexed normally. The Thymeleaf expression falls back to noai, noimageai when the attribute is absent.
Is this compatible with Spring Boot?
Yes. Spring Boot embeds Tomcat/Jetty/Undertow as a servlet container. Add @ServletComponentScan to your @SpringBootApplication class to enable @WebFilter discovery. Alternatively, declare the filter as a @Bean — Spring Boot registers it via FilterRegistrationBean automatically. For the Spring-idiomatic approach using OncePerRequestFilter and Spring Security integration, see the Spring Boot guide.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.