AI Bot Guides
Practical, copy-paste-ready guides for controlling AI crawlers, optimising for AI search, and protecting your content.
How to Block AI Bots on Hertz (Go): Complete 2026 Guide
Hertz is ByteDance's high-performance HTTP framework for Go, built on netpoll for epoll/kqueue-based I/O instead of net/http. Middleware signature: func(ctx context.Context, c *app.RequestContext). c.Request.Header.Get("User-Agent") returns empty string when absent — never nil. c.AbortWithStatus(consts.StatusForbidden) stops the handler chain, but code after Abort() still executes — always return immediately after aborting. app.Use() for global middleware. h.Group("/api").Use() for route-group scoped middleware. c.Next(ctx) requires context.Context parameter (unlike Gin's c.Next()). NOT net/http compatible — cannot use http.Handler middleware directly. Porting from Gin is straightforward (same API patterns: c.Next, c.Abort, c.Set/Get) but types differ (*app.RequestContext vs *gin.Context). c.Request.URI().Path() returns []byte — convert with string(). server.Default() includes recovery middleware, server.New() does not. c.AbortWithStatusJSON() for structured JSON error responses. 4-way comparison: Hertz vs Gin vs Fiber vs Echo.
Read guide →How to Block AI Bots on Poem (Rust): Complete 2026 Guide
Poem is a full-featured async Rust web framework built on hyper and tokio. Middleware trait has one method: transform(&self, ep: E) -> Self::Output — called ONCE at startup to wrap the inner Endpoint. Per-request bot detection lives in the wrapper struct's Endpoint::call implementation. req.headers().get(header::USER_AGENT) returns Option<&HeaderValue>; to_str() returns Result<&str, ToStrError> — not infallible, HTTP headers can contain non-UTF-8 bytes (RFC 7230 obs-text). CRITICAL: extract User-Agent BEFORE calling self.0.call(req).await — Request is moved into the inner endpoint (Rust ownership). Block: return Ok(Response::builder().status(StatusCode::FORBIDDEN).body("Forbidden")) without calling inner. Pass: self.0.call(req).await?.into_response() then resp.headers_mut().insert() for X-Robots-Tag. .with(AiBotBlocker) on Route applies globally. .nest("/api", routes.with(AiBotBlocker)) for path-scoped blocking. #[handler] macro converts async fn to Endpoint — supports &Request (borrowed) for in-handler checking without ownership issues. .with() stacking: last applied = outermost. Simpler than Tower (Axum) — no Pin<Box<dyn Future>>, no associated Future types. Trade-off: Poem middleware is framework-specific, Tower middleware is reusable. 4-way comparison: Poem vs Axum vs Actix-web vs Rocket.
Read guide →How to Block AI Bots on Rocket (Rust): Complete 2026 Guide
Rocket is a Rust web framework with a unique architecture: Fairings (lifecycle callbacks) CANNOT abort requests — they return () from on_request. Request Guards (FromRequest trait) are the idiomatic blocking mechanism: return Outcome::Error((Status::Forbidden, ())) to short-circuit with 403 before the handler runs. This guide covers: (1) Request Guard via FromRequest that checks req.headers().get_one("User-Agent") and returns Outcome::Error for AI bots — applied per-route as a function parameter (_guard: AiBotGuard). (2) X-Robots-Tag via Fairing on_response — res.set_raw_header() on all responses globally. (3) Global blocking via Fairing on_response override — rewrites the entire response to 403 after the handler runs (handler still executes, wasted computation). (4) robots.txt via FileServer::from("./static") or include_str!() compile-time embedding. (5) Custom Responder for noai meta tag + X-Robots-Tag combined. (6) #[catch(403)] catcher for clean error pages. Key distinction: Fairings are side-effects, Guards are access control — Rocket separates these by design. For zero-overhead global blocking, put Rocket behind nginx/Caddy. 4-way comparison: Rocket Request Guard vs Actix-web middleware vs Axum Tower layer vs Warp Filter.
Read guide →How to Block AI Bots on Nitro: Complete 2026 Guide
Nitro is the universal server engine powering Nuxt 3, Analog, TanStack Start, and standalone deployments — compiles to Node, Bun, Deno, Cloudflare Workers, Vercel Edge, and AWS Lambda via presets. Two blocking approaches: (1) Plugin: defineNitroPlugin(nitroApp => { nitroApp.hooks.hook('request', async event => { ... }) }) — registers a request hook at startup, fires before routing. (2) Middleware: server/middleware/block-bots.ts with defineEventHandler — Nitro auto-loads all files in server/middleware/ in alphabetical order on every request. In both: getRequestHeader(event, 'user-agent') for UA access, createError({ statusCode: 403 }) to block, setResponseHeader(event, 'X-Robots-Tag', ...) for headers. robots.txt: server/routes/robots.txt.ts (filename maps to URL automatically) or static public/robots.txt (served before server routes). Dynamic block-list via useStorage() with TTL cache — works with fs, Redis, Cloudflare KV, and other Nitro storage drivers. server/utils/ auto-imported by Nitro — no import statements needed in handlers. 4-way comparison: Nitro vs Hono vs Bun HTTP vs Cloudflare Workers.
Read guide →How to Block AI Bots in Supabase Edge Functions: Complete 2026 Guide
Supabase Edge Functions run on Deno at the network edge using the standard Fetch API (Request/Response). Block AI bots at the top of Deno.serve() before any database call — const ua = req.headers.get('user-agent')?.toLowerCase() ?? ''; if (AI_BOTS.some(b => ua.includes(b))) return new Response('Forbidden', { status: 403 }). Shared helper: create supabase/functions/_shared/ai-bots.ts with the UA list and isAiBot() check — Supabase bundles _shared/ at deploy time, import with relative path from any function. robots.txt: Supabase functions have no filesystem at runtime — embed as a string constant in a dedicated 'robots' function, deploy with --no-verify-jwt for public access. X-Robots-Tag on all non-403 responses. Dynamic block-list: query a Supabase table via service-role client, cache in module-level variable with TTL — update rules without redeploy. 4-way comparison: Supabase Edge vs Cloudflare Workers vs Vercel Edge Middleware vs AWS Lambda@Edge.
Read guide →How to Block AI Bots on Play Framework (Scala/Java): Complete 2026 Guide
Play Framework is a reactive web framework built on Pekko (formerly Akka) — used by Twitter, LinkedIn, and enterprise Java/Scala shops. Bot blocking uses EssentialFilter for global interception — receives RequestHeader, returns Accumulator[ByteString, Result]. Block before body parsing with Accumulator.done(Results.Forbidden) — zero overhead, body never read. EssentialFilter vs Filter: EssentialFilter can short-circuit pre-parse, Filter always parses first. Action composition for per-route control: ActionBuilder with invokeBlock checking UA before calling block(request). Play Java: @With(AiBotBlockerAction.class) annotation on controllers or individual actions. Dependency injection via Guice: @Inject() constructor. application.conf: play.filters.enabled += "filters.AiBotFilter" for global registration. Twirl template meta tag via implicit request. X-Robots-Tag via .map(_.withHeaders(...)) on result Future. Assets controller for robots.txt (conf/routes: GET /robots.txt controllers.Assets.at). Play 3.0+ uses Pekko (Apache fork of Akka) — filter API unchanged, only stream import paths differ. 4-way comparison: Play EssentialFilter vs Vert.x Handler vs Spring Boot Filter vs Akka HTTP Directive.
Read guide →How to Block AI Bots on Vert.x (Java): Complete 2026 Guide
Vert.x is Eclipse's reactive toolkit for high-performance non-blocking web services on the JVM. Vert.x Web uses a Router with ordered Handler<RoutingContext> middleware — fully event-driven, no thread-per-request. Block AI bots by registering a handler on Router.route() (matches all paths) with order(-1) so it fires before any route handler. Check ctx.request().getHeader("User-Agent"), respond with ctx.response().setStatusCode(403).end() for AI bots, or ctx.next() for legitimate traffic. Set X-Robots-Tag via ctx.addHeadersEndHandler() — callback fires just before response flush, regardless of which handler produced the response. Serve robots.txt via StaticHandler on /robots.txt route (order -2, before bot blocker). Sub-router isolation: mount protected routes under Router.router(vertx) sub-router with its own bot blocker — public routes stay unblocked. Regex route matching: routeWithRegex() for fine-grained path control. EventBus service blocking for clustered deployments: pass User-Agent in DeliveryOptions headers, check in message consumer. Kotlin variant with first-class coroutine support. 4-way comparison: Vert.x Handler<RoutingContext> vs Spring Boot FilterRegistrationBean vs Micronaut HttpServerFilter vs Quarkus @ServerRequestFilter.
Read guide →How to Block AI Bots on Feathers.js (Node.js): Complete 2026 Guide
Feathers.js adds a service + hooks architecture on top of Koa (v5 Dove, current) or Express (v4 Crow). Two distinct blocking layers: (1) Koa/Express HTTP middleware for hard 403 before Feathers routing — ctx.status = 403; ctx.body = 'Forbidden'; return (Koa) or res.status(403).send() (Express); (2) Feathers application hooks for cross-transport blocking (REST + WebSocket) — throw new Forbidden() from @feathersjs/errors. Application hooks receive HookContext with context.params.headers['user-agent'] for UA access. X-Robots-Tag: set downstream after await next() in Koa middleware (Koa onion model — code after next() runs on the way out). WebSocket blocking via app.on('connection', ...) connection hook — leave channels for AI bot connections before they receive any events; combine with application hooks for full coverage. Per-service scoping: app.service('articles').hooks({ before: { all: [blockAiBots] } }) instead of app.hooks() for global. robots.txt via koa-static before Feathers middleware. params.headers vs params.connection?.headers distinction — REST populates former, socket.io populates latter. v4 vs v5: same hook API, only transport layer differs. 4-way comparison: Feathers v5 (Koa) vs Feathers v4 (Express) vs Sails.js vs Hapi.js.
Read guide →How to Block AI Bots on CherryPy (Python): Complete 2026 Guide
CherryPy uses a 'Tool' system instead of traditional middleware — callables attached to lifecycle hook points (before_handler, before_finalize, on_start_resource, etc.) and activated via config: 'tools.block_bots.on': True. Block by raising cherrypy.HTTPError(403) in before_handler — same raise pattern as Falcon (raise HTTPForbidden) and Bottle (abort(403)), not Flask/Django's return pattern. Class-based Tool: subclass cherrypy.Tool, bind to hook in __init__, override _setup() to attach multiple hooks (before_handler for blocking + before_finalize for X-Robots-Tag). Function-based: @cherrypy.tools.register('before_handler') decorator. Config-driven activation is unique to CherryPy — enable globally via '/': {'tools.block_bots.on': True}, disable per-path via '/public': {'tools.block_bots.on': False}, or per-method via @cherrypy.tools.block_bots() decorator. Tool parameters configurable via config: 'tools.block_bots.strict': False. Priority ordering (1-100, lower = earlier within same hook). staticfile tool for robots.txt (served before custom tools). cherrypy.request.robots for noai meta (Mako/Jinja2). 4-way comparison: CherryPy Tool vs Flask hook vs Django middleware vs Pyramid tween.
Read guide →How to Block AI Bots on Pyramid (Python): Complete 2026 Guide
Pyramid uses 'tweens' — a unique middleware factory system. Each tween is an outer function receiving (handler, registry) that returns an inner function taking (request). Block by returning Response(status=403) without calling handler(request). Continue by calling handler(request) and mutating the response. Unlike Flask (return-based), Django (class __call__), and Bottle (abort raises), Pyramid tweens explicitly thread the handler through — same chain model as aiohttp and Django's get_response pattern. OVER/UNDER ordering instead of numeric priority: config.add_tween('myapp.tweens.ai_bot_blocker_factory', over=pyramid.tweens.INGRESS) places tween at outermost position. registry parameter gives tween factory access to app-level settings (registry.settings). request.robots = 'noai, noimageai' for per-request template data — Pyramid Request objects are extensible. NewRequest events for metadata-only hooks (cannot abort). Path-prefix scoping inside tween via request.path.startswith(). WebTest TestApp + pyramid.testing.DummyRequest for unit testing. 4-way comparison: Pyramid tween vs Django __call__ vs Flask return vs Bottle abort.
Read guide →How to Block AI Bots on Bottle (Python): Complete 2026 Guide
Bottle is Python's single-file micro-framework with zero dependencies. Blocking uses @app.hook('before_request') + abort(403) — raises HTTPError exception to stop the request. Unlike Flask (return a Response from before_request), Bottle hooks CANNOT return to block — returning has no effect, you must abort() or raise. This matches Falcon's raise pattern. after_request hook runs on ALL responses (including 403 errors) — use response.set_header('X-Robots-Tag', 'noai, noimageai'). Per-request data: request.environ['robots'] = 'noai, noimageai' (WSGI environ dict — no Flask g, no Sanic request.ctx). Plugin system for per-route control: class with apply(callback, route) method, route.config.get('skip_bot_check') for opt-out. static_file() for robots.txt (goes through hooks — must exempt). Sub-app mount() for path-prefixed isolation (mounted apps have independent hooks). Hooks are always global (no Blueprint scoping). 4-way comparison: Bottle abort() vs Flask return vs Falcon raise vs Django return.
Read guide →How to Block AI Bots on aiohttp (Python): Complete 2026 Guide
aiohttp middleware uses a two-argument pattern: @web.middleware decorated function receives (request, handler). Call await handler(request) to continue to the next middleware/route — do NOT call it to block. Return web.Response(status=403, text='Forbidden') to block. Unlike Sanic (return-None to continue, no handler arg) and Starlette (BaseHTTPMiddleware class with call_next), aiohttp passes the next handler explicitly — conceptually identical to Koa.js's await next(). Middleware list is FIFO (first = outermost = runs first) — same as Sanic, opposite of Starlette's add_middleware() LIFO. request['robots'] = 'noai, noimageai' for per-request data (Request is MutableMapping, unlike Sanic's request.ctx SimpleNamespace). Response headers are mutable CIMultiDict — response.headers['X-Robots-Tag'] = value mutates in place (unlike PSR-7 withHeader). Sub-applications for scoped middleware: web.Application(middlewares=[blocker]) mounted via app.add_subapp('/api/', sub_app) — equivalent of Sanic Blueprints. Static routes go through middleware (unlike Sanic/Flask) — must exempt /robots.txt explicitly. Old-style factory (pre-3.0) vs new-style @web.middleware (3.0+). 4-way comparison: aiohttp vs Sanic vs Starlette vs Koa.js.
Read guide →How to Block AI Bots on Sails.js (Node.js): Complete 2026 Guide
Sails.js has two middleware layers: policies (after routing, per-controller/action) and HTTP middleware (before routing, Express-level). Primary approach: api/policies/isNotAiBot.js — call next() to continue, return res.status(403).send('Forbidden') to block. Global registration: config/policies.js with '*': ['isNotAiBot'] (applies to all controllers). Per-controller: 'UserController': { '*': ['isNotAiBot'] }. HTTP middleware alternative: config/http.js middleware.order array — runs before Sails routing, catches every request including assets. Key distinction: policies run AFTER routing (action resolved, can be per-action), HTTP middleware runs BEFORE routing (Express layer, no action context). robots.txt: assets/robots.txt (not public/) — Sails serves assets/ as static root via express.static. res.locals.robots for noai meta in EJS/Pug views — set in policy, read in layout template. X-Robots-Tag: custom response middleware in config/http.js or policy after() equivalent via res.set() before next(). Comparison: Sails.js policy vs Express middleware vs NestJS Guard — all Node.js, different abstraction levels.
Read guide →How to Block AI Bots on Yii2 (PHP): Complete 2026 Guide
Yii2 uses a behavior-based filter system. Primary approach: ActionFilter behavior with beforeAction(Action $action): bool — return false to block after setting Yii::$app->response->statusCode=403, calling ->send(), and Yii::$app->end(). afterFilter() for X-Robots-Tag: Yii::$app->response->headers->add('X-Robots-Tag', 'noai, noimageai'). Document root is web/ (not public/) — web/robots.txt served by Apache/nginx before PHP. Global registration: base controller with behaviors() array, or application 'on beforeRequest' / 'on afterRequest' events in config/web.php. Controller-scoped: add to behaviors() with optional 'only' => ['action1'] or 'except' => ['health']. Yii::$app->params['robots'] = 'noai, noimageai' in beforeAction() for template access via Yii::$app->params['robots'] ?? 'noai, noimageai'. Document root PHP comparison: Yii2 web/ vs CakePHP webroot/ vs Laravel/Symfony/CI4 public/. 4-way PHP comparison: Yii2 (return false + end()) vs Laravel (return response(403)) vs Symfony (setResponse) vs CakePHP (PSR-15 return). str_contains() PHP 8.0+ / strpos() PHP 7.4 fallback.
Read guide →How to Block AI Bots on Micronaut (Java): Complete 2026 Guide
Micronaut uses a reactive filter system — HttpServerFilter.doFilter() returns Publisher<MutableHttpResponse<?>>. Block by returning Mono.just(HttpResponse.status(HttpStatus.FORBIDDEN).body('Forbidden')) without calling chain.proceed(); continue by returning Mono.from(chain.proceed(request)). @Filter('/**') for global registration; @Filter('/api/**') for path-scoped. getOrder() for priority (lower = runs first; use -100 to run before default auth). X-Robots-Tag: Mono.from(chain.proceed()).doOnNext(res -> res.header('X-Robots-Tag', ...)). MutableHttpResponse.header() mutates in place — unlike PSR-7, no withHeader() capture needed. Static resources: src/main/resources/public/robots.txt served by Netty before filters. Kotlin variant: same Publisher return, Kotlin extension functions for Reactor. 3-way Java comparison: Micronaut (reactive Publisher) vs Quarkus (return Response/abortWith) vs Spring Boot (synchronous doFilterInternal). Testing: @MicronautTest + injected HttpClient with .toBlocking().exchange(). Micronaut vs Quarkus static resource paths: public/ vs META-INF/resources/.
Read guide →How to Block AI Bots on Quarkus (Java): Complete 2026 Guide
Quarkus has two filter APIs: modern @ServerRequestFilter(preMatching=true) for quarkus-resteasy-reactive — annotate a CDI bean method, return Response to block or null to continue; and classic ContainerRequestFilter (@Provider @PreMatching) for quarkus-resteasy — implement filter() and call requestContext.abortWith(Response.status(403).build()) to block. Both use @PreMatching to run before JAX-RS routing (more efficient — request rejected before any routing work, auth filters skipped entirely). Static resources: src/main/resources/META-INF/resources/robots.txt — served by Vert.x HTTP layer before JAX-RS filters, no config needed. @ServerResponseFilter / ContainerResponseFilter.filter(req, res) for X-Robots-Tag. Priority: @Priority(Priorities.AUTHENTICATION - 100) = 900 — runs before auth (1000). abortWith() vs Spring return false vs Micronaut Mono.just(HttpResponse.forbidden()). 4-way comparison: Quarkus Reactive vs Quarkus Classic vs Spring Boot OncePerRequestFilter vs Micronaut HttpServerFilter. Testing: @QuarkusTest with RestAssured given().header('User-Agent', ...).when().get().then().statusCode(403).
Read guide →How to Block AI Bots on Litestar (Python): Complete 2026 Guide
Litestar (formerly Starlite) is Python's modern full-featured ASGI framework — distinct from FastAPI/Starlette. Two middleware approaches: AbstractMiddleware (high-level, override dispatch(scope, receive, send)) and MiddlewareProtocol (pure ASGI __call__). For AI bot blocking, AbstractMiddleware is recommended — dispatch() receives the raw ASGI triple; block by awaiting Response(403)(scope, receive, send) directly; pass through with await self.app(scope, receive, send). No call_next() — unlike Starlette's BaseHTTPMiddleware. DefineMiddleware(Cls, **kwargs) is Litestar's factory wrapper (equivalent to Starlette's Middleware()) for passing constructor args at registration. Middleware list is FIFO (first item = outermost = runs first) — opposite of Starlette's add_middleware() LIFO. X-Robots-Tag: wrap send callable to intercept http.response.start messages and inject the header. Route/Router/Controller-scoped: middleware=[DefineMiddleware(AiBotBlocker)] parameter at any registration level — inherits top-down. StaticFilesConfig for robots.txt (resolved before middleware). scope['state']['robots'] for noai meta via state dict. TestClient from litestar.testing with custom headers. Comparison: Litestar dispatch(scope,receive,send) vs FastAPI dispatch(request,call_next) vs pure ASGI __call__.
Read guide →How to Block AI Bots on CakePHP: Complete 2026 Guide
CakePHP 4 and 5 implement PSR-15 MiddlewareInterface — same process(request, handler) pattern as Slim 4, but registered through Application::middleware() with a MiddlewareQueue object. Key distinction: use prepend() not add() — CakePHP's built-in stack (Security, CSRF, Session, Routing) processes requests in queue order, so add() would run the bot blocker after sessions start. prepend() runs the bot blocker first. Document root is webroot/ (not public/) — robots.txt goes in webroot/robots.txt and is served by Apache/nginx before PHP. PSR-7 immutability applies: $response = $response->withHeader('X-Robots-Tag', ...) — withHeader() returns a new object, must capture. Blocking: return (new Response())->withStatus(403)->withStringBody(...) without calling $handler->handle(). Request attributes: $request = $request->withAttribute('robots', 'noai, noimageai') — template reads $this->request->getAttribute('robots'). Route-scoped: $builder->registerMiddleware('aiBotBlocker', new AiBotBlocker()) + $builder->applyMiddleware('aiBotBlocker') inside a scope(). CakePHP 4 vs 5: identical middleware signatures, CakePHP 5 requires PHP 8.1+. Testing: IntegrationTestTrait configRequest(['headers' => [...]]) + assertResponseCode(403). 4-way comparison: CakePHP (MiddlewareQueue::prepend) vs Slim 4 (addMiddleware LIFO) vs Symfony (EventSubscriber priority) vs Laravel (bootstrap/app.php prepend).
Read guide →How to Block AI Bots on Sanic (Python): Complete 2026 Guide
Sanic is Python's async-native web framework — no WSGI, built on its own event loop. Middleware uses @app.on_request and @app.on_response decorators. Blocking is return-based: return an HTTPResponse from on_request to stop the chain; return None to continue. Unlike Falcon (raise HTTPForbidden exception) and Starlette/FastAPI (ASGI BaseHTTPMiddleware class), Sanic shares Flask's return pattern but is fully async. Key distinctions: request.ctx (SimpleNamespace) for per-request data — equivalent to Flask's g but request-scoped and async-safe; app.ctx for application-level context. Middleware order: FIFO for on_request (first registered runs first), LIFO for on_response. Blueprint-scoped middleware via @bp.on_request applies only to that Blueprint's routes. Static files via app.static('/robots.txt', ...) resolved at routing layer before middleware runs. Old syntax: @app.middleware('request') / @app.middleware('response') — identical signatures to on_request/on_response, still supported in 21.12+. 4-way comparison: Sanic (async return) vs Flask (sync return) vs Falcon (raise) vs Starlette (ASGI call_next). Testing: app.test_client.get() with custom headers.
Read guide →How to Block AI Bots on Falcon (Python): Complete 2026 Guide
Falcon is Python's bare-metal REST framework — class-based middleware with process_request(req, resp) and process_response(req, resp, resource, req_succeeded). Blocking is exception-based: raise falcon.HTTPForbidden() in process_request() rather than returning a response object — unlike Flask's before_request() (return Response) and Django's process_request() (return HttpResponse). process_request() runs before routing; process_resource(req, resp, resource, params) runs after routing but before the responder — use it to check the matched resource class. process_response() runs after the responder; req_succeeded is False if any exception was raised (including HTTPForbidden), so guard with if req_succeeded before setting X-Robots-Tag. Reading UA: req.get_header('User-Agent') or '' — Falcon normalizes headers case-insensitively. robots.txt: RobotsTxtResource with on_get() registered at /robots.txt; add to EXEMPT_PATHS. noai meta: req.context['robots'] = 'noai, noimageai' in process_request(); Jinja2 template reads {{ robots | default(...) }}. Path-scoped: check req.path.startswith('/api/') since Falcon has no per-group middleware. ASGI variant: falcon.asgi.App with async def process_request — identical signatures. Testing: falcon.testing.TestClient with simulate_get() and custom UA headers. Comparison: Falcon (raise) vs Flask (return) vs Django (return) — all stop the chain but different mechanisms.
Read guide →How to Block AI Bots on Beego (Go): Complete 2026 Guide
Beego is a full-featured Go MVC framework with its own ORM, template engine, and session management. Unlike Gin, Echo, and Chi which use next()-based middleware chains, Beego uses InsertFilter() with named execution points: BeforeStatic → BeforeRouter → BeforeExec → AfterExec → FinishRouter. Use BeforeRouter for bot blocking — earliest point, runs before routing so blocked requests never hit the controller. Filter function signature: func(ctx *context.Context) — no next() call, no return value. BLOCK: ctx.Abort(403, 'Forbidden') — writes status+body and sets internal flag stopping controller execution. X-Robots-Tag: ctx.ResponseWriter.Header().Set('X-Robots-Tag', 'noai, noimageai') — ctx.ResponseWriter wraps http.ResponseWriter. Reading UA: ctx.Request.Header.Get('User-Agent') — ctx.Request is standard *http.Request. robots.txt: static/ directory auto-served by Beego at /static/; for /robots.txt root path use web.Get('/robots.txt', ...) route registered before filters. noai meta: BaseController struct embeds beego.Controller, Prepare() sets Data['robots'] = 'noai, noimageai'; template uses {{.robots}}; per-controller override via c.Data['robots'] = 'index, follow'. Registration: beego.InsertFilter('*', beego.BeforeRouter, filters.AiBotFilter) global. Route-scoped: beego.InsertFilter('/api/*', beego.BeforeRouter, filters.AiBotFilter) — glob pattern. 4-way comparison: Beego (no next), Gin (c.Next/c.Abort chain), Echo (wrapper), Chi (net/http wrapper).
Read guide →How to Block AI Bots on Tornado (Python): Complete 2026 Guide
Tornado is Python's original async web framework — no middleware stack. Bot blocking uses a BaseHandler class that all route handlers inherit from. prepare() is called before any get()/post() method — it is Tornado's equivalent of middleware. BLOCK: self.send_error(403) in prepare() — raises Finish exception, route handler never runs. X-Robots-Tag: self.set_header('X-Robots-Tag', 'noai, noimageai') in prepare() after bot check — Tornado buffers headers until finish(). Reading UA: self.request.headers.get('User-Agent', '').lower() — HTTPHeaders is case-insensitive, empty default prevents AttributeError. robots.txt: StaticFileHandler route registered FIRST — (r'/robots.txt', StaticFileHandler, {'path': 'static/robots.txt'}) — Tornado matches routes in order so static files are served before BaseHandler runs. noai meta: override get_template_namespace() in BaseHandler to inject robots variable with 'noai, noimageai' default. Per-handler opt-out: ALLOW_AI_BOTS = True class variable — BaseHandler.prepare() checks getattr(self, 'ALLOW_AI_BOTS', False) before UA matching. Async prepare(): supported in Tornado 6.1+ — just add async def prepare(self) — useful for Redis rate-limit lookups before blocking. Comparison: vs FastAPI (BaseHTTPMiddleware dispatch) vs Django (process_request middleware) — Tornado's prepare() pattern is how tornado.auth mixins work internally. Runs own IOLoop — no ASGI server needed. Fork processes with tornado.process.fork_processes(0) for multi-core production.
Read guide →How to Block AI Bots on Sinatra (Ruby): Complete 2026 Guide
Sinatra is a lightweight Ruby DSL built on Rack — the universal Ruby HTTP interface that also underpins Rails, Hanami, and Roda. Bot blocking uses a standard Rack middleware class: initialize(app) stores the next app; call(env) returns the Rack [status, headers, body] triplet. BLOCK: return [403, {'content-type' => 'text/plain'}, ['Forbidden']] — do NOT call @app.call(env). PASS THROUGH: status, headers, body = @app.call(env); headers['x-robots-tag'] = 'noai, noimageai'; [status, headers, body] — Rack returns a mutable headers Hash so X-Robots-Tag can be set after calling the inner app (unlike Go net/http where headers must be set before). Reading UA: env['HTTP_USER_AGENT'] — Rack normalises HTTP headers to HTTP_ prefix + uppercase + underscores. .to_s guards against nil for bots with no UA. robots.txt: public/robots.txt — Sinatra auto-serves public/ via Rack::Static before routing, no route definition needed. noai meta: ERB layout.erb with <%= @robots || 'noai, noimageai' %> and per-route @robots = 'index, follow' instance var override. Registration: use AiBotBlocker (classic-style top-level or inside Sinatra::Base subclass, or in config.ru). Rack FIFO order: first use = outermost = runs first — same as Express and Gin, opposite of Sinatra::Base's own stack. Route-scoped: Rack::URLMap + Rack::Builder to apply middleware only to /api prefix, leaving public / unblocked. Framework-agnostic: same AiBotBlocker class in Rails (config.middleware.use AiBotBlocker) and plain Rack config.ru unchanged.
Read guide →How to Block AI Bots on Starlette: Complete 2026 Guide
Starlette is the ASGI toolkit that powers FastAPI — FastAPI is a Starlette subclass, so every Starlette middleware works identically in both. Bot blocking uses BaseHTTPMiddleware with a dispatch() override. robots.txt: PlainTextResponse route at /robots.txt (Starlette serves no robots.txt automatically) or StaticFiles mount from a static/ directory. noai meta: Jinja2Templates base.html with {{ robots | default('noai, noimageai') }} Jinja2 expression for per-page override. BaseHTTPMiddleware dispatch(): override async def dispatch(self, request, call_next) → Response. BLOCK: return Response('Forbidden', status_code=403) immediately — do NOT call await call_next(request). PASS THROUGH: response = await call_next(request); response.headers['X-Robots-Tag'] = 'noai, noimageai'; return response — unlike Go net/http, Starlette returns a Response object from call_next() so headers can be set after. request.headers.get('user-agent', '').lower() for safe case-insensitive UA access. EXEMPT_PATHS = {'/robots.txt', '/sitemap.xml', '/favicon.ico'} before UA check. Registration: app.add_middleware(AiBotBlocker) LIFO (last added = outermost, opposite of Express/Gin). Pure ASGI middleware: class AiBotBlockerASGI with __call__(scope, receive, send) for zero-overhead blocking; wraps send callable to inject X-Robots-Tag into http.response.start messages. Route-scoped: Mount('/api', routes=[...], middleware=[Middleware(AiBotBlocker)]) — Middleware() list is FIFO (first = outermost), inverse of add_middleware(). Framework-agnostic: pure ASGI middleware works with Starlette, FastAPI, Django ASGI, and Litestar unchanged.
Read guide →How to Block AI Bots on Chi (Go): Complete 2026 Guide
Chi is a lightweight, idiomatic Go HTTP router built directly on net/http — it introduces zero custom types for middleware. Bot blocking uses the standard func(http.Handler) http.Handler signature, identical to net/http middleware. robots.txt: static/ directory served via r.Get("/robots.txt", func(w, r) { http.ServeFile(w, r, "static/robots.txt") }) or exempt /robots.txt in the middleware's exemptPaths map. noai meta: html/template layout with {{ .Robots }} variable and default "noai, noimageai" fallback; or add in your separate SPA frontend if Chi serves JSON only. Middleware: func(next http.Handler) http.Handler returns an http.HandlerFunc. BLOCK: http.Error(w, "Forbidden", http.StatusForbidden) + return — do NOT call next.ServeHTTP(). PASS THROUGH: w.Header().Set("X-Robots-Tag", "noai, noimageai") BEFORE next.ServeHTTP(w, r) — headers must be set before the inner handler calls WriteHeader() or Write(), which freezes the header map. This differs from Echo/Gin where setting headers after next works. No custom types: no *gin.Context, no echo.Context, no *fiber.Ctx — standard http.ResponseWriter + *http.Request. Any net/http middleware works with Chi out of the box. r.Use(AiBotBlocker) for global — FIFO order (first Use runs first). r.Route("/api", func(api chi.Router) { api.Use(...) }) for path-prefixed sub-routers. r.Group(func(r chi.Router) { r.Use(...) }) for inline groups at the same path level. chi.URLParam(r, "key") for route parameters. Chi middleware is interchangeable with net/http — zero vendor lock-in.
Read guide →How to Block AI Bots on Fiber (Go): Complete 2026 Guide
Fiber is a Go web framework built on fasthttp — not net/http. Bot blocking uses a fiber.Handler middleware: func(c *fiber.Ctx) error. robots.txt: place in ./public/ directory and register with app.Static("/", "./public") BEFORE bot middleware — Fiber serves static files first, bypassing the middleware entirely. noai meta: html/template layout with {{ .Robots }} variable and default "noai, noimageai" fallback; or add in your separate frontend if Fiber serves JSON only. fiber.Handler middleware: single *fiber.Ctx holds both request and response. c.Get("User-Agent") reads request header; c.Set("X-Robots-Tag", "noai, noimageai") writes response header — mutable, no new object returned (unlike PSR-7). BLOCK: return c.Status(fiber.StatusForbidden).SendString("Forbidden") — do NOT call c.Next(). PASS THROUGH: err := c.Next(); c.Set("X-Robots-Tag", "noai, noimageai"); return err — check the error from c.Next() before adding headers. EXEMPT_PATHS: map[string]bool for O(1) lookup. Registration: app.Use(middleware.AiBotBlocker) — Fiber runs middleware in FIFO order (first registered runs first), so register early before CORS/auth. Route-scoped: api := app.Group("/api", middleware.AiBotBlocker) — only routes under the group are protected. fasthttp context pooling: *fiber.Ctx is recycled via sync.Pool — do NOT retain references to c or c.Body() beyond the handler; copy data if needed for goroutines. Compared to net/http (wrapper pattern) and Gin (c.AbortWithStatus + gin.Context), Fiber's single func(c *fiber.Ctx) error signature is the simplest but requires fasthttp awareness.
Read guide →How to Block AI Bots on Slim 4 (PHP): Complete 2026 Guide
Slim 4 is a PHP micro-framework that adopts PSR-15 for middleware and PSR-7 for HTTP messages. Bot blocking uses a MiddlewareInterface class with a single process() method. robots.txt: public/robots.txt (document root served before PHP runs) or dynamic Slim route with $app->get('/robots.txt', ...). noai meta: PHP/Twig base layout with $robots ?? 'noai, noimageai' fallback variable. PSR-15 process() method: two parameters (no $response, unlike Slim 3) — $request (ServerRequestInterface) + $handler (RequestHandlerInterface). BLOCK: return new Response(403) without calling $handler->handle() — do NOT pass to next handler. PASS THROUGH: $response = $handler->handle($request); return $response->withAddedHeader('X-Robots-Tag', 'noai, noimageai'). PSR-7 IMMUTABILITY: withHeader() returns a NEW instance — must capture result or header is silently lost. str_contains() (PHP 8.0+) or strpos() !== false (PHP 7.4). EXEMPT_PATHS: in_array($path, EXEMPT, true). Registration: $app->addMiddleware(new AiBotBlocker()) global — add LAST so it runs outermost (Slim LIFO). Route-scoped: $app->group('/api', fn)->add(new AiBotBlocker()). Slim 3 vs 4: Slim 3 used 3-param callable ($request, $response, $next); Slim 4 uses PSR-15 ($request, $handler) — not backward compatible. PSR-15 middleware is reusable across Slim 4, Mezzio, Symfony.
Read guide →How to Block AI Bots on CodeIgniter 4: Complete 2026 Guide
CodeIgniter 4 calls its request/response interceptors 'Filters' — classes implementing FilterInterface with before() and after() methods. robots.txt: public/robots.txt (document root — served by Apache/nginx before PHP runs, no route needed). noai meta: app/Views/layouts/ base template with <?= esc($robots ?? 'noai, noimageai') ?> and controller override. FilterInterface pattern: before() for hard 403 block — return service('response')->setStatusCode(403)->setBody('Forbidden') to short-circuit (return null to continue). after() for X-Robots-Tag — $response->setHeader('X-Robots-Tag', 'noai, noimageai'). No 'next()' call — returning null from before() is how control passes forward. AI_BOT_PATTERNS: const array of lowercase substrings; $ua = strtolower($request->getHeaderLine('User-Agent')); str_contains($ua, $pattern) (PHP 8.0+, use strpos for 7.4). EXEMPT_PATHS: in_array($path, EXEMPT_PATHS, true) check before UA match. Registration: app/Config/Filters.php — $aliases['aiBotBlocker'] = AiBotBlocker::class; $globals['before'][] = 'aiBotBlocker'. Route-scoped: $routes->group('api', ['filter' => 'aiBotBlocker'], fn) — use instead of $globals for API-only blocking. php spark filter:check GET /path — verify registration (CI 4.3+). Blocked bots never reach after() — X-Robots-Tag only on legitimate responses. CodeIgniter's document root is public/ — never place robots.txt in project root alongside app/.
Read guide →How to Block AI Bots on Medusa.js: Complete 2026 Guide
Medusa.js v2 is an open-source headless commerce platform built on Express and Node.js — exposes /store/* and /admin/* REST APIs for your storefront. Bot blocking uses the defineMiddlewares() export in src/api/middlewares.ts: the only supported way to register middleware in Medusa v2. robots.txt: static/robots.txt at project root (alongside src/ and medusa-config.ts) — Medusa serves the static/ directory at the web root automatically, no route definition needed. noai meta: your storefront (Next.js Starter / Nuxt / SvelteKit) — Medusa returns JSON, not HTML; add the meta tag in your frontend's base layout. defineMiddlewares() pattern: export default defineMiddlewares({ routes: [{ matcher: '/**', middlewares: [aiBotBlocker] }] }) — Medusa auto-discovers src/api/middlewares.ts, no registration step. aiBotBlocker: MedusaRequest/MedusaResponse/MedusaNextFunction types (extend Express types); EXEMPT_PATHS Set for /robots.txt + /sitemap.xml; res.sendStatus(403) for AI bots (do NOT call next()); res.set('X-Robots-Tag', 'noai, noimageai') + next() for legitimate requests. Storefront-only blocking: matcher: '/store/**' to leave /admin/* and /hooks/* unblocked. Middleware ordering: place bot blocker first in routes array — blocked requests never reach CORS or auth middleware. Medusa v2 vs v1: v1 used src/api/index.ts loader pattern with direct app.use() calls; v2 replaced this with defineMiddlewares(). Default backend port: 9000.
Read guide →How to Block AI Bots on Vapor (Swift): Complete 2026 Guide
Vapor is the most widely used server-side Swift framework — async/await-native, built on SwiftNIO, and deployable on Linux and macOS. Bot blocking uses the AsyncMiddleware protocol. robots.txt: place in Public/ directory — FileMiddleware (app.middleware.use(FileMiddleware(publicDirectory: app.directory.publicDirectory))) auto-serves it; or define app.get('robots.txt') as a direct route with Response(status: .ok, headers: [Content-Type: text/plain], body: .init(string: ROBOTS_TXT)). noai meta: Leaf base template (Resources/Views/base.leaf) with <meta name="robots" content="#(robots ?? \"noai, noimageai\")"> — pass robots key from controller context for per-page override. AsyncMiddleware: struct AiBotMiddleware: AsyncMiddleware — respond(to:chainingTo:) is async throws. EXEMPT_PATHS check (Set<String>) before UA match. request.headers.first(name: .userAgent)?.lowercased() for case-insensitive matching. Hard 403: return Response(status: .forbidden) without calling next.respond(). X-Robots-Tag: response.headers.add(name: 'X-Robots-Tag', value: 'noai, noimageai') after await next.respond(). Global registration: app.middleware.use(AiBotMiddleware()) in configure.swift (after FileMiddleware). Route-grouped: let protected = app.grouped(AiBotMiddleware()) — only routes defined on protected are blocked. Deployment: Docker with swift:5.9-jammy base image; Fly.io, Railway, Render, Heroku Swift buildpack, any Linux VPS.
Read guide →How to Block AI Bots with Java Servlet: Complete 2026 Guide
Java Servlet is the foundational HTTP API for Tomcat, Jetty, WildFly, and all Jakarta EE containers — the layer beneath Spring Boot, Spring MVC, and Jakarta Faces. Bot blocking uses a Filter implementation: doFilter() intercepts every request before any servlet executes. robots.txt: place in src/main/webapp/ (WAR) or src/main/resources/static/ (Spring Boot embedded) — served by the container before any filter. noai meta: JSP base layout fragment with <meta name="robots" content="noai, noimageai" />, JSTL ${not empty robots ? robots : 'noai, noimageai'} expression for per-page override via request.setAttribute(). X-Robots-Tag: response.setHeader('X-Robots-Tag', 'noai, noimageai') after chain.doFilter(). Hard 403: AI_BOTS.matcher(ua).find() → response.sendError(SC_FORBIDDEN) + return — must NOT call chain.doFilter() after blocking. EXEMPT_PATHS check (Set.of('/robots.txt', '/sitemap.xml', '/favicon.ico')) before UA check. Registration: @WebFilter('/*') annotation (Servlet 3.0+, Tomcat 7+, requires @ServletComponentScan in Spring Boot) OR web.xml <filter> + <filter-mapping> (all versions, explicit order control). javax.servlet vs jakarta.servlet namespace split: Tomcat 9 and earlier = javax.servlet.*; Tomcat 10+ = jakarta.servlet.* — matching the container version is critical (wrong namespace = ClassNotFoundException at startup). Maven scope=provided (container supplies the API at runtime). Spring Boot: register as @Bean of type Filter, or add @ServletComponentScan for @WebFilter discovery.
Read guide →How to Block AI Bots on Elysia: Complete 2026 Guide
Elysia is a Bun-native TypeScript web framework — plugin-first architecture, lifecycle hooks, and the fastest Node-alternative benchmarks. Bot blocking uses Elysia's hook system. robots.txt: Bun.file('./static/robots.txt') in a .get('/robots.txt') route (BunFile implements Blob — pass directly to Response constructor), or inline string constant for bun build --compile single-binary deployments. noai meta: @elysiajs/html plugin (.use(html())) — return JSX with <meta name="robots" content="noai, noimageai" /> in <head>; or html template literal for plain string responses. X-Robots-Tag: .onAfterHandle() lifecycle hook — context.set.headers['X-Robots-Tag'] = 'noai, noimageai'. Runs after handler, before response is sent. Selective: check context.set.headers['Content-Type'] to skip API routes. Hard 403: .onBeforeHandle() global hook — check request.headers.get('user-agent') against AI_BOTS regex; set.status = 403; return 'Forbidden'. EXEMPT_PATHS: parse URL from request.url, check pathname against /robots.txt + /sitemap.xml + /favicon.ico before bot check. Plugin pattern (recommended): export const aiBotBlock = new Elysia({ name: 'ai-bot-block' }).onBeforeHandle(...) — name property enables Elysia singleton deduplication (plugin registered once even if .use() called multiple times). Guard pattern: .guard({ beforeHandle: [aiBotCheck] }, (app) => app.get('/api/...')) for route-scoped blocking without affecting public routes. Deployment: Bun runtime required (oven/bun Docker image); Fly.io, Railway, Render, and any Docker host with Bun support.
Read guide →How to Block AI Bots on AWS: Complete 2026 Guide
AWS is an infrastructure platform — bot blocking spans multiple services. robots.txt: S3 object served through CloudFront CDN via aws s3 cp --content-type 'text/plain', or an Express/Lambda route. noai meta: application-layer SSR (Lambda handler HTML injection). X-Robots-Tag: CloudFront Response Headers Policy (no code, console or Terraform, CDN-level). Hard 403: two options — CloudFront Functions (lightweight cloudfront-js-2.0 runtime, sub-ms, $0.10/M req, viewer-request trigger, var/indexOf syntax) or Lambda@Edge (full Node.js, event.Records[0].cf.request, must deploy in us-east-1, $0.60/M req, supports async calls). Both must be deployed in us-east-1 for CloudFront. CloudFront UA normalization: CloudFront collapses User-Agent by default for cache efficiency — must configure cache policy to include UA in cache key for full string access in CloudFront Functions. Lambda@Edge receives original headers without this workaround. AWS WAF Bot Control: managed rule group (AWS-maintained, auto-updates), Terraform scope='CLOUDFRONT', inspection_level='COMMON' or 'TARGETED', ~$10/mo base + $1/M req. API Gateway without CloudFront: Lambda authorizer (REQUEST type) or early return in Lambda handler. Comparison: CF Functions (simplest/cheapest, limited JS runtime), Lambda@Edge (full Node.js, more powerful, higher cost), WAF Bot Control (no code, automatic updates, most expensive). Both CF Functions and Lambda@Edge must be deployed in us-east-1 regardless of origin region.
Read guide →Does Blocking AI Bots Hurt SEO? The Complete Answer (2026)
Short answer: no. Blocking AI training bots has zero effect on Google Search rankings. The fundamental reason: SEO crawlers (Googlebot, Bingbot) and AI training bots (GPTBot, ClaudeBot, CCBot) are separate programs with different user agents. Blocking GPTBot does not affect Googlebot — they are not the same program. Google-Extended is Google's AI training token (safe to block); Googlebot is the search indexing crawler (never block). Wildcard Disallow is the one mistake that CAN hurt SEO: User-agent: * / Disallow: / blocks Googlebot too — always use named user-agent tokens. robots.txt noai directive: not processed by Google Search, zero ranking effect. X-Robots-Tag: noai: not recognised by Google Search for indexing purposes, zero effect on rankings. 403 responses to AI bots: invisible to Google — Googlebot gets a 200, AI bots get 403, Google ranking systems never see the 403s. The real trade-off: AI search engines (PerplexityBot, OAI-SearchBot, YouBot) are distinct from training bots. Blocking them reduces Perplexity/SearchGPT visibility — not an SEO issue but an AEO (Answer Engine Optimization) question. Selective blocking strategy: block training crawlers (GPTBot/ClaudeBot/CCBot/Bytespider/Google-Extended) while allowing AI search indexers (PerplexityBot/OAI-SearchBot) for AEO visibility without training data contribution. Summary table: what each bot sees, what gets blocked, SEO effect.
Read guide →AI Bot User Agents List 2026 — Complete Reference
Complete reference list of AI crawler and AI training bot user agent strings in 2026. 20 bots listed: OpenAI (GPTBot, ChatGPT-User, OAI-SearchBot), Anthropic (ClaudeBot, anthropic-ai, Claude-Web), Google (Google-Extended — NOT Googlebot, which must not be blocked), CCBot (Common Crawl, used by GPT/Llama/Mistral training datasets), Bytespider (ByteDance), Applebot-Extended (Apple Intelligence training — distinct from base Applebot which is Apple Search), PerplexityBot, Diffbot, cohere-ai, FacebookBot (Meta/Llama), Amazonbot (Alexa AI), omgili/omgilibot (Webz.io data aggregation), iaskspider (iAsk.ai), YouBot (You.com), img2dataset (HuggingFace image datasets). Detection: case-insensitive substring matching — lowercase the User-Agent header and check contains(). Never use exact matching — UA strings include version numbers. Code samples: JavaScript/TypeScript, Python, PHP, Go — all with the same pattern. robots.txt: list each bot individually with its own User-agent token + Disallow — wildcard Disallow would block Googlebot. robots.txt compliance: OpenAI/Anthropic/Google/Apple/Perplexity = yes; CCBot/Bytespider/Diffbot = variable; img2dataset = no. Update cadence: 2-4 new bot UAs per year. Check OpenAI/Anthropic/Google docs and server access logs monthly. Quick-copy array for direct use in middleware.
Read guide →How to Block AI Bots on Netlify: Complete 2026 Guide
Netlify is a deployment platform — bot blocking spans Netlify CDN config and Edge Functions. robots.txt: public/robots.txt (CDN-served) in publish directory; static/robots.txt for SvelteKit (auto-copied to dist). noai meta: framework-specific — Next.js Metadata API, SvelteKit <svelte:head>, Astro <head>, Nuxt defineNuxtConfig app.head. X-Robots-Tag: _headers file in publish directory (path + header format, no config required) OR netlify.toml [[headers]] with for = "/*" and [headers.values] (repo root, structured, regex-capable). Both are CDN-level, framework-agnostic. _headers takes precedence when both are set for the same path. Hard 403: Netlify Edge Functions (Deno runtime) — netlify/edge-functions/block-bots.ts. handler(request: Request, context: Context). context.next() to pass through, return new Response('Forbidden', { status: 403 }) to block. Inline config: export const config: Config = { path: '/*' } — no netlify.toml [[edge_functions]] entry needed. AI_BOTS list (lowercase), EXEMPT_PATHS check first (/robots.txt must stay accessible). Combined: await context.next() then response.headers.set() for X-Robots-Tag on legitimate traffic. Netlify Functions vs Edge Functions: Functions = Lambda/API routes only (wrong tool for bot blocking). Edge Functions = global edge nodes, Deno, ~0ms cold start, intercepts page requests (correct tool). Deno runtime: no require(), standard Web APIs (Request/Response/URL global), TypeScript native. Cross-platform: same UA-matching logic works on Vercel/Netlify — just swap the wrapper.
Read guide →How to Block AI Bots on Vercel: Complete 2026 Guide
Vercel is a deployment platform, not a framework — bot blocking spans two layers: Vercel infrastructure and your framework code. robots.txt: public/robots.txt (CDN-served) for any framework, OR Next.js app/robots.ts MetadataRoute.Robots for programmatic generation (SSG at build time). noai meta: Next.js Metadata API robots field in layout.tsx (site-wide) or generateMetadata per-page. Per-page override wins. X-Robots-Tag: vercel.json headers array (CDN-level, framework-agnostic, no code) OR next.config.ts headers() function (conditional per-path). Hard 403: middleware.ts at project root — Edge Middleware runs on Vercel Edge Network (not Lambda), ~1-5ms overhead, before framework executes. AI_BOTS list (lowercase), EXEMPT_PATHS check first (/robots.txt must remain accessible), matcher config excludes _next/static + _next/image. NextRequest/NextResponse from next/server — works with all Vercel-deployed frameworks. Combined middleware: set X-Robots-Tag on NextResponse.next() for legitimate requests, return 403 Response for bots. Vercel Firewall (Pro/Enterprise): WAF in dashboard — no code, no redeploy. Rules: User-Agent contains GPTBot → Block. Runs before Edge Middleware. vercel.json firewall.rules (rolling out). Edge Config: @vercel/edge-config — get('aiBots') in middleware, updates propagate in under 300ms without redeployment. Framework compatibility: same middleware.ts works for Next.js/SvelteKit/Nuxt/Astro/Remix. Non-Next.js: use standard Web API Request/Response types instead of NextRequest/NextResponse.
Read guide →How to Block AI Bots on October CMS: Complete 2026 Guide
October CMS is a Laravel-based open-source CMS with Twig templating and a plugin architecture — popular in the PHP ecosystem as a lighter Craft/WordPress alternative. robots.txt: public/robots.txt (Apache/Nginx serve before PHP — zero CMS overhead). Dynamic: create CMS page with url = "/robots.txt" + {% header "Content-Type: text/plain; charset=utf-8" %} in Twig — page body is the file contents. Static takes priority if both exist. noai meta: themes/mytheme/layouts/default.htm — <meta name="robots" content="{{ this.page.viewBag.robots ?? 'noai, noimageai' }}">. this.page.viewBag exposes the [viewBag] INI section from each CMS page's configuration block. Per-page: add robots = "index, follow" under [viewBag] in the page's Settings → Page Properties — blank falls through to layout default. X-Robots-Tag and hard 403: AiBotMiddleware.php Laravel middleware. handle() method: getallheaders() + array_change_key_case, check User-Agent against bot list. 403 path: abort(403) or response('Forbidden', 403). X-Robots-Tag path: $next($request) then $response->header('X-Robots-Tag', 'noai, noimageai'). Registration: Plugin.php boot() method — app()->make(\Illuminate\Contracts\Http\Kernel::class)->pushMiddlewareToGroup('web', AiBotMiddleware::class). EXEMPT_PATHS: /robots.txt, /sitemap.xml, /favicon.ico — check before UA match. Apache .htaccess: RewriteCond %{HTTP_USER_AGENT} chain with [NC,OR] flags, final [F,L] to return 403. Nginx: map $http_user_agent $is_ai_bot block + if ($is_ai_bot) { return 403; } in server block. October CMS v3 uses Laravel 10+, v2 uses Laravel 9 — middleware API identical. Plugins in plugins/myauthor/myplugin/; Plugin.php is the entry point; middleware lives in plugins/myauthor/myplugin/middleware/.
Read guide →How to Block AI Bots on Analog (Angular): Complete 2026 Guide
Analog is the full-stack Angular meta-framework — file-based routing, SSR, API routes, and a Nitro-powered server (same as Nuxt and TanStack Start). robots.txt: public/robots.txt (Vite static) or src/server/routes/robots.txt.ts (Nitro route with defineEventHandler + text/plain Content-Type). Nitro route takes priority over public/. noai meta: two approaches. Site-wide: inject Meta from @angular/platform-browser in AppComponent, call this.meta.addTag({ name: 'robots', content: 'noai, noimageai' }) in constructor. Per-route: export routeMeta: RouteMeta = { meta: [{ name: 'robots', content: 'index, follow' }] } from page file — @analogjs/router, colocated with component, works with SSR. Dynamic: Meta.updateTag in ngOnInit after route data resolves. routeMeta vs Meta: routeMeta is static/file-based (SSR-ready), Meta service is runtime/programmatic (dynamic values). X-Robots-Tag: vite.config.ts analog({ nitro: { routeRules: { '/**': { headers: { 'X-Robots-Tag': 'noai, noimageai' } } } } }) — Nitro applies before Angular SSR. Exclude /api/** override. Hard 403: src/server/middleware/bot-block.ts — defineEventHandler, getHeader, throw createError({ statusCode: 403 }). Nitro auto-discovers middleware alphabetically. Combined middleware: block + setResponseHeader in one file. Nitro shared with Nuxt/TanStack Start/SolidStart — same server/middleware/ convention, same routeRules, same adapters. vite.config.ts preset: vercel/netlify/cloudflare-pages/node-server/firebase.
Read guide →How to Block AI Bots on Vike (vite-plugin-ssr): Complete 2026 Guide
Vike (formerly vite-plugin-ssr) is a Vite-based SSR framework with minimal magic — bring your own server (Express/Fastify/Hono), UI framework (React/Vue/Solid/Preact), and routing. No built-in middleware system: bot blocking is plain HTTP server middleware before renderPage(). robots.txt: public/robots.txt (Vite static) OR dedicated server route before renderPage() catch-all — app.get('/robots.txt', ...). Express.static('public') must come first so robots.txt bypasses bot check. noai meta: pages/+Head.tsx (React/Solid) or pages/+Head.vue (Vue) at pages/ root — applies to all pages. Per-page: +Head in page directory (both render; browser uses last matching meta). Data-driven: useData() from vike-react in +Head, value loaded by +data.ts. X-Robots-Tag: set after renderPage() in catch-all handler (headers.forEach + res.setHeader). Fastify: addHook('onSend') checks content-type. Hono: c.res.headers.set after await next(). Hard 403: middleware BEFORE renderPage() — isAIBot(ua) + res.status(403).send(). Vike never called — no SSR, no DB queries. EXEMPT_PATHS check first so /robots.txt accessible. Fastify: addHook('preHandler') for same pattern. Framework differences: server middleware identical for React/Vue/Solid — only +Head file extension differs (.tsx vs .vue). Deployment: Node/Express, Fastify, Hono (Cloudflare Workers native), Vercel, Netlify — all full support. Vike adapters: vike-vercel, vike-cloudflare.
Read guide →How to Block AI Bots on TanStack Start: Complete 2026 Guide
TanStack Start is a full-stack React framework on TanStack Router + Vinxi + Nitro — two layers: router (React SSR, file-based routes in app/routes/) and Nitro server (HTTP, server/ directory). robots.txt: public/robots.txt (Vite static) or server/routes/robots.txt.ts Nitro server route — defineEventHandler, setHeader Content-Type text/plain. Server routes take priority over public/ when paths conflict. noai meta: head() option on createRootRoute in app/routes/__root.tsx — head: () => ({ meta: [{ name: 'robots', content: 'noai, noimageai' }] }). <Meta /> component renders all collected meta. Per-route override: head() on createFileRoute() — leaf route wins for duplicate names. Data-driven: head: ({ loaderData }) => ({ meta: [{ name: 'robots', content: loaderData.robots }] }). X-Robots-Tag: app.config.ts routeRules — server: { routeRules: { '/**': { headers: { 'X-Robots-Tag': 'noai, noimageai' } } } }. Nitro applies at server level before React rendering. Selective: /api/** route rule override. Hard 403: server/middleware/bot-block.ts — defineEventHandler from h3, getHeader(event, 'user-agent'), throw createError({ statusCode: 403 }). Nitro auto-loads all server/middleware/ files alphabetically. EXEMPT_PATHS check before UA check. createMiddleware vs Nitro: @tanstack/start createMiddleware runs only in createServerFn() context (NOT on page requests) — must use Nitro middleware for global bot blocking. Deployment: app.config.ts preset (vercel/netlify/cloudflare-pages/node-server/bun) — Nitro adapter-agnostic, same middleware works everywhere.
Read guide →How to Block AI Bots on Umbraco: Complete 2026 Guide
Umbraco is the most popular open-source .NET CMS — 750,000+ websites, enterprise/government/agency. Built on ASP.NET Core with Razor templates, flexible Document Types, and optional headless Content Delivery API. robots.txt: wwwroot/robots.txt (UseStaticFiles serves before Umbraco routing — zero CMS overhead). Dynamic: Surface Controller with [Route('/robots.txt')] action + ContentResult — static file takes priority so remove wwwroot/robots.txt when using controller. noai meta: Views/Shared/_Layout.cshtml with @inherits UmbracoViewPage — Model.HasValue('robotsTag') ? Model.Value<string>('robotsTag') : 'noai, noimageai'. Per-document: Text Box property 'robotsTag' added to Document Type in backoffice (Settings → Document Types) — editors set per-page value, blank = default. X-Robots-Tag: ASP.NET Core middleware registered in Program.cs after BootUmbracoAsync() but before UseUmbraco(). Response.OnStarting() to set headers (can't set after response started). Composition-based: IComposer class in Composers/ — Umbraco auto-discovers via reflection, no Program.cs edit needed. Hard 403: middleware checks UA before _next(); EXEMPT_PATHS includes /robots.txt + /sitemap.xml + /umbraco (backoffice!). Backoffice path customised via appsettings.json Umbraco:CMS:Path — must match exempt path. Content Delivery API: runs at /umbraco/delivery/api/v2/ — middleware blocks AI bots from API too. Exempt it for legitimate use or block separately. API key auth: Umbraco:CMS:DeliveryApi:PublicAccess false. IIS: web.config customHeaders as fallback. Umbraco Cloud: Git deploy, environment-specific appsettings, Azure-hosted.
Read guide →How to Block AI Bots on RedwoodJS: Complete 2026 Guide
RedwoodJS is a full-stack React framework with a strict web/api split — React SPA on CDN, GraphQL+Prisma on Fastify serverless. This split shapes bot blocking: web side uses platform features, api side uses Fastify hooks. robots.txt: web/public/robots.txt (copied to build output, CDN-served on Netlify/Vercel). Dynamic via Redwood Function (api/src/functions/robots.ts) + _redirects: /robots.txt /.redwood/functions/robots 200. noai meta: Metadata component from @redwoodjs/web in layout — <Metadata robots='noai, noimageai' /> injects into document head. Per-page override: Metadata in page component (page-level takes precedence). MetaTags alternative for older Redwood. X-Robots-Tag: Netlify _headers (web/public/_headers /* rule — copied to build), netlify.toml [[headers]], Vercel vercel.json headers array, Cloudflare Pages _headers. Hard 403 web side: Netlify Edge Function netlify/edge-functions/bot-block.ts (path='/*' config export), Vercel middleware.ts at project root (Next.js middleware API). Hard 403 api side: Fastify addHook('preHandler') in api/server.config.ts for self-hosted — reply.code(403).send(). Serverless function: UA check before processing. Redwood v8+ middleware: api/src/middleware/botBlock.ts. EXEMPT_PATHS pattern for /robots.txt access. Deployment: Netlify (best DX — _headers + Edge Functions), Vercel, Cloudflare Pages, Render, self-hosted Fastify.
Read guide →How to Block AI Bots on Craft CMS: Complete 2026 Guide
Craft CMS is the agency-favourite PHP CMS — built on Yii2, with Twig templates, flexible field layouts, and no bundled SEO plugin. robots.txt: static web/robots.txt (Apache/Nginx serve without touching PHP — fastest option). Dynamic Twig: templates/robots.txt.twig + config/routes.php URL rule 'robots.txt' => ['template' => 'robots.txt'] — requires {% header "Content-Type: text/plain" %} to override HTML default. Multi-site: craft.app.sites.currentSite.handle check for per-site rules. noai meta: base Twig layout {{ robots ?? 'noai, noimageai' }} with Twig nullish coalescing. Per-entry field: Plain Text field 'robotsTag' added to section field layout — entry.robotsTag ?? robots ?? 'noai, noimageai' in layout. Craft field handles: camelCase from name. X-Robots-Tag: Apache mod_headers Header always set, Nginx add_header always (applies to all status codes). Hard 403 — Apache: RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR] chain + RewriteRule [F,L]. Nginx: map $http_user_agent $is_ai_bot { ~*GPTBot 1; ... } + if ($is_ai_bot) { return 403; }. Craft module: modules/AiBotModule/Module.php — getRequest()->getUserAgent() check in init(), Craft::$app->end() with 403. Register in config/app.php bootstrap. Server vs PHP: server-level faster (no PHP), module gives Craft context (exempt CP users). SEOmatic plugin: Plugin Settings → Robots → robots.txt Template for Disallow rules; Additional Meta Tags for noai (built-in Robots dropdown lacks noai option). Deployment: Apache/Nginx+PHP-FPM, Craft Cloud (custom Nginx directives via CP), Ploi/Forge, Docker.
Read guide →How to Block AI Bots on AdonisJS: Complete 2026 Guide
AdonisJS is a batteries-included Node.js MVC framework — TypeScript-first, with its own ORM (Lucid), Edge templating engine, IoC container, and structured middleware system. Closer to Laravel than Express. robots.txt: place in public/ directory — AdonisJS static middleware serves it automatically (no config). Dynamic route: router.get('/robots.txt', async ({ response }) => { response.type('text/plain'); return '...' }) — routes take priority over static files. noai meta: Edge base layout resources/views/layouts/main.edge — {{ robots ?? 'noai, noimageai' }} with nullish coalescing (all JS operators work in Edge). Controller passes robots value: return view.render('pages/home', { robots: 'index, follow' }). Global middleware: node ace make:middleware AiBotBlock → app/middleware/ai_bot_block_middleware.ts — handle(ctx, next) method. ctx.request.header('user-agent') for UA check. ctx.response.status(403).send('Forbidden') to block. Register in start/kernel.ts server.use([]) array. server.use runs before routing (all requests); router.use runs only after route match — use server.use for bot blocking. EXEMPT_PATHS: skip /robots.txt, /sitemap.xml so crawlers can read them. ctx.logger.warn({ ua, path }) for structured Pino JSON logs. Named middleware: register in router.named({ botBlock: () => import('#middleware/ai_bot_block_middleware') }) — apply per-route with .use(middleware.botBlock()). Middleware order: place after static_middleware so public/ files bypass bot check. Deployment: node ace build → output in build/ → node build/bin/server.js. Docker multi-stage, Fly.io fly.toml, Railway env vars (NODE_ENV + APP_KEY from node ace generate:key), Heroku, Render — all full support.
Read guide →How to Block AI Bots on Wagtail: Complete 2026 Guide
Wagtail is the most popular Django-based CMS — used by NASA, Google, Mozilla, and thousands of agencies. It sits on top of Django's request/response pipeline with its own page-serving layer and hooks. robots.txt: Django TemplateView wired in urls.py (path BEFORE Wagtail catch-all) or Nginx static file for high traffic. Dynamic view checks settings.DEBUG for staging vs production. noai meta tag: add to base.html template — {{ page.robots_tag|default:'noai, noimageai' }} (Django) or {{ page.robots_tag if page.robots_tag else 'noai, noimageai' }} (Jinja2). Wagtail's promote_panels: add robots_tag CharField to base Page model (blank=True, default='') with FieldPanel in MultiFieldPanel — editors set per-page overrides in the Promote tab alongside SEO title and search description. before_serve_page hook: @hooks.register('before_serve_page') in wagtail_hooks.py — return HttpResponse('Forbidden', status=403) to block; return None to continue. Hook fires only for Wagtail page requests, has access to the page object. after_serve_page hook: @hooks.register('after_serve_page') to add X-Robots-Tag — mirrors per-page robots_tag field value. Django middleware: AIBotMiddleware class in myapp/middleware.py — process_request returns 403, process_response sets X-Robots-Tag. Runs for ALL requests (admin, API, custom views, Wagtail pages). EXEMPT_PATHS list for /robots.txt, /admin/, Wagtail admin path. Middleware placement: early in MIDDLEWARE list, before wagtail.middleware.SiteMiddleware. Deployment: Nginx+Gunicorn (robots.txt served at Nginx level, zero Django overhead), Wagtail Cloud, Heroku, Railway, Docker, Fly.io — all support all 4 protection layers.
Read guide →How to Block AI Bots on Fresh (Deno): Complete 2026 Guide
Fresh is Deno's official web framework — Preact rendering, island architecture (partial hydration), file-based routing under routes/. Unlike static site generators, Fresh has a runtime server with built-in middleware. robots.txt: static/robots.txt served automatically, or routes/robots.txt.ts dynamic handler (route wins over static — returns new Response() with text/plain). DENO_DEPLOYMENT_ID env var for production detection. noai meta: routes/_app.tsx <head> for app-wide default; per-route override via <Head> component from $fresh/runtime.ts (Fresh injects into head, browsers use last matching meta). Data-driven: handler passes robots value to component via ctx.render(). X-Robots-Tag: routes/_middleware.ts — async handler(req, ctx) calls ctx.next(), sets resp.headers.set('X-Robots-Tag', 'noai, noimageai'). Middleware scope: _middleware.ts applies to directory + all subdirectories. Selective: check content-type for text/html to skip API routes. Hard 403: same _middleware.ts — check UA against AI_BOT_PATTERNS array before ctx.next(), return new Response('Forbidden', { status: 403 }). Bot check runs before route handler. console.log for Deno Deploy dashboard logging. Islands: SSR means meta tags always in initial HTML — no JS execution needed for crawlers. Deno Deploy: primary target, zero config — push to GitHub, set entrypoint to main.ts. All middleware runs at edge. Deployment: all platforms (Deno Deploy, Docker, Fly.io, Railway, AWS Lambda) support all 4 layers because Fresh has a runtime server.
Read guide →How to Block AI Bots on Qwik: Complete 2026 Guide
Qwik is a resumable JavaScript framework by Builder.io — eliminates hydration by serialising component state into HTML. Qwik City is its meta-framework: file-based routing, server-side middleware via onRequest handlers, DocumentHead for meta management, and deployment adapters. robots.txt: public/robots.txt for static, or src/routes/robots.txt/index.ts endpoint for dynamic (onGet handler returns text/plain Response via send()). noai meta tag: export head: DocumentHead from layout.tsx for site-wide default ({ meta: [{ name: 'robots', content: 'noai, noimageai' }] }); per-route head export overrides layout — same-name meta tags resolved by depth (deepest wins). Dynamic head via resolveValue for content-level control. X-Robots-Tag: plugin@headers.ts in src/routes/ — onRequest sets headers.set('X-Robots-Tag', 'noai, noimageai') + await next(). Plugin files (plugin@name.ts) apply to directory and all subdirectories; multiple plugins run alphabetically. Hard 403: plugin@bot-block.ts — onRequest checks UA, calls send(new Response('Forbidden', { status: 403 })) to short-circuit (no next() after send). Static adapter (SSG): onRequest does NOT run — use host-level headers (netlify.toml, vercel.json, _headers) and Edge Functions. Deployment: Node/Cloudflare/Netlify/Vercel/Deno adapters all support full onRequest middleware; static adapter requires hosting config.
Read guide →How to Block AI Bots on Zola: Complete 2026 Guide
Zola is a fast, opinionated static site generator written in Rust — single binary, no dependencies, Tera templates, TOML front matter. robots.txt: place in static/ directory — Zola copies everything in static/ to public/ automatically, no config needed. noai meta tag: edit templates/base.html — use {% if page.extra.robots %} chain with fallback to config.extra.default_robots, then hardcoded 'noai, noimageai'. Tera syntax: {{ page.extra.robots | default(value='noai, noimageai') }} (named parameter required — positional args error). Per-page override: [extra] robots = "index, follow" in TOML front matter (+++). Custom fields MUST go in [extra] table — top-level unknown keys cause build error. Section override: content/blog/_index.md with [extra] robots. Site-wide default: config.toml [extra] default_robots = "noai, noimageai" — no built-in front matter defaults cascade like Hugo/Jekyll, use config.extra + template fallbacks. X-Robots-Tag: Netlify netlify.toml [[headers]] with ZOLA_VERSION pinned; Vercel vercel.json (no auto-detect — specify build command); Cloudflare Pages static/_headers → public/_headers (native Zola support). Hard 403: Netlify Edge Function; Vercel middleware.ts at project root; Cloudflare Pages functions/_middleware.ts. Deployment table: Netlify ✓✓✓✓ (best native support), Vercel ✓✓✓✓, Cloudflare Pages ✓✓✓✓, GitHub Pages ✓✓✗✗, Fly.io ✓✓✓✓.
Read guide →How to Block AI Bots on Bridgetown: Complete 2026 Guide
Bridgetown is a modern Ruby static site generator — ERB by default (not Liquid), config in bridgetown.config.yml (not _config.yml). robots.txt: create src/robots.txt with Disallow: directives — Bridgetown copies it to output/ automatically. Dynamic version: src/robots.txt.erb with permalink: /robots.txt in front matter (ERB template wins over static file). noai meta tag: edit layouts/default.erb — use <%= resource.data.robots || 'noai, noimageai' %> (resource.data accessor, not page.robots — common mistake). Liquid alternative: {{ resource.data.robots | default: 'noai, noimageai' }}. Per-page override: robots: index, follow in YAML front matter. Global default via front_matter_defaults in bridgetown.config.yml: scope type: pages + values robots: noai, noimageai. Builder API plugin: plugins/builders/robots_defaults.rb — class RobotsDefaults < SiteBuilder; def build; hook :resources, :pre_render do |resource|; resource.data.robots ||= 'noai, noimageai'; end; end; end. X-Robots-Tag: _headers file at src/_headers (auto-copied unlike Lume) — /* rule with X-Robots-Tag: noai, noimageai. Hard 403: Netlify Edge Function netlify/edge-functions/bot-block.ts; Vercel middleware.ts in project root; Cloudflare Pages functions/_middleware.ts. Deployment table: Netlify ✓✓✓✓, Vercel ✓✓✓✓, Cloudflare Pages ✓✓✓✓, GitHub Pages ✓✓✗✗ (no headers/edge), Render ✓✓✓✗.
Read guide →How to Block AI Bots on Lume: Complete 2026 Guide
Lume is a fast, flexible static site generator built on Deno — no Node.js, no npm. It supports Nunjucks, Markdown, JSX, TypeScript, and multiple template engines, outputting static HTML to _site/. Because Lume has no server process in production, AI bot protection combines robots.txt, noai meta tags, host-level headers, and Edge Functions. robots.txt: place in source root + site.copy("robots.txt") in _config.ts — copies to _site/ verbatim. Alternative: src/robots.txt.ts page with url: "/robots.txt" export for environment-based dynamic content (page wins over static copy if both exist). noai meta: add to base layout in _includes/ — Nunjucks: {{ robots | default('noai, noimageai') }}, JSX: page.data.robots ?? "noai, noimageai", Liquid: {{ robots | default: 'noai, noimageai' }}. Global default: site.data("robots", "noai, noimageai") in _config.ts or _data.yml at source root. Per-page override: robots: index, follow in front matter. X-Robots-Tag: Netlify netlify.toml [[headers]], Vercel vercel.json headers(), Cloudflare Pages _headers file (must site.copy("_headers") — files starting with _ not auto-copied). Hard 403: Netlify Edge Function (netlify/edge-functions/bot-block.ts); Vercel middleware.ts in project root (NOT inside _site/); Cloudflare Pages functions/_middleware.ts (functions/ is outside publish dir). Deno Deploy: custom server.ts with UA check before serveDir() — serveDir() returns immutable Response, clone headers with new Headers(res.headers) before setting X-Robots-Tag. GitHub Actions + deployctl deploy for CI. Deployment table: Deno Deploy / Netlify / Vercel / Cloudflare Pages / GitHub Pages / Firebase Hosting.
Read guide →How to Block AI Bots on NGINX Unit: Complete 2026 Guide
NGINX Unit is configured entirely through a JSON REST API — no config file, no restarts. Bot blocking uses routes: a JSON array of steps evaluated in order. Each step has a match object and an action. Add bot-blocking step first: match.headers["User-Agent"] as array of wildcard patterns ("*GPTBot*", "*ClaudeBot*", ...) [OR logic] or single ~regex string. Action: {"return": 403}. Wildcard * matches any sequence; ~ prefix = PCRE regex with (?i) for case-insensitive. Apply via control API: curl -X PUT --data-binary @unit.json --unix-socket /var/run/control.unit.sock http://localhost/config — live, no restart. PATCH to specific paths (e.g. /config/routes) to update sections. POST to /config/routes/0 inserts at position 0. X-Robots-Tag: response_headers in pass action ({"pass": "applications/myapp", "response_headers": {"X-Robots-Tag": "noai, noimageai"}}). robots.txt: add route step with match.uri = "/robots.txt" + action.share = "/var/www/static$uri" (before app pass step). No custom body in return action — forward to app for body if needed. Docker: unit:1.32.1-python3.12 (or node/go/php/ruby tags), /docker-entrypoint.d/ for initial config. Available for Python, Node, Go, PHP, Ruby, Java.
Read guide →How to Block AI Bots on OpenLiteSpeed: Complete 2026 Guide
OpenLiteSpeed (OLS) supports Apache-compatible .htaccess — most Apache bot blocking rules work unchanged. Enable .htaccess first in WebAdmin: Virtual Hosts → [vhost] → General → Enable .htaccess → Yes. RewriteRule approach: RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|...) [NC,OR] + RewriteRule .* - [F,L]. SetEnvIfNoCase + <RequireAll> alternative (requires mod_setenvif + mod_authz_core). X-Robots-Tag: WebAdmin → Virtual Host → General → Custom Response Headers (no modules needed); or .htaccess Header always set (requires mod_headers). robots.txt: place in document root, served automatically. LSCache gotcha: serves cached responses before .htaccess rules fire — blocked bots may get cached pages. Fix: add rules at server level in WebAdmin → Server Configuration → Rewrite (fires before cache) or use ModSecurity (phase:1, pre-cache). ModSecurity rules: SecRule REQUEST_HEADERS:User-Agent with @rx regex, phase:1, deny, status:403. Server-level vs vhost-level: server rules apply across all vhosts and beat cache; vhost/.htaccess rules fire after. CLI reload: /usr/local/lsws/bin/lswsctrl restart. OLS vs LiteSpeed Enterprise: identical .htaccess config, Enterprise adds cPanel/WHM/HTTP3/QUIC.
Read guide →How to Block AI Bots on Pelican: Complete 2026 Guide
Pelican is the most popular Python static site generator. robots.txt: use EXTRA_PATH_METADATA in pelicanconf.py — create extra/robots.txt, add 'extra' to STATIC_PATHS, map {'extra/robots.txt': {'path': 'robots.txt'}} to output root. Don't place in theme static/ (goes to output/theme/, not root). Dynamic robots.txt: write a plugin using pelican.signals.finalized (fires after output is written) — connect generate function, write to output path, vary by PELICAN_ENV environment variable. noai meta tag: edit themes/[theme]/templates/base.html Jinja2 template — {% if article is defined and article.robots %} chain with fallback to 'noai, noimageai'; copy theme to local dir first, never edit system-wide theme files. Per-article override: :robots: index, follow, noai, noimageai in RST header; robots: ... in Markdown metadata block. X-Robots-Tag: hosting layer — Netlify netlify.toml [[headers]]; Vercel vercel.json; Cloudflare Pages: extra/_headers mapped via EXTRA_PATH_METADATA (same pattern as robots.txt). GitHub Pages: no custom headers, noai meta only. Hard 403: Netlify Edge Function; Cloudflare Pages functions/_middleware.ts. Full pelicanconf.py with STATIC_PATHS, EXTRA_PATH_METADATA, THEME, PLUGINS, URL structure.
Read guide →How to Block AI Bots on Lighttpd: Complete 2026 Guide
Lighttpd ("lighty") is a lightweight, event-driven web server popular on VPS and embedded Linux. Bot blocking uses mod_access with $HTTP["useragent"] conditional blocks: $HTTP["useragent"] =~ "GPTBot|ClaudeBot|..." { url.access-deny = ("") }. =~ operator = POSIX extended regex, case-sensitive; =~* for case-insensitive (Lighttpd 1.4.46+); (?i) flag for older versions. url.access-deny = ("") denies all URLs with 403; can also list specific paths. Both mod_access and mod_setenv must be listed in server.modules (bundled, no install needed). X-Robots-Tag: setenv.add-response-header = ("X-Robots-Tag" => "noai, noimageai") — use set-response-header (1.4.46+) to avoid duplicates if backend also sets it. robots.txt: place in server.document-root, served automatically. Conditional nesting: $HTTP["url"] =~ "^/blog/" { $HTTP["useragent"] =~ "GPTBot" { url.access-deny = ("") } } for path-specific rules (3 levels max). Rate limiting: no built-in request rate limiting — use connection.limit (per-IP), iptables, or fail2ban. Module order matters in server.modules list. Config test + graceful reload: lighttpd -t -f + systemctl reload lighttpd.
Read guide →How to Block AI Bots on IIS: Complete 2026 Guide
IIS uses web.config XML for configuration. Bot blocking requires the URL Rewrite module (free, must be installed separately — without it the <rewrite> section causes a 500 error). Rule pattern: match url=".*" + condition input="{HTTP_USER_AGENT}" pattern="(GPTBot|ClaudeBot|...)" ignoreCase="true" + action type="CustomResponse" statusCode="403". Always include stopProcessing="true" to halt the rewrite pipeline. AbortRequest vs CustomResponse: AbortRequest closes TCP immediately (fastest), CustomResponse sends a proper HTTP 403 (cleaner for bot crawlers). X-Robots-Tag: system.webServer/httpProtocol/customHeaders — <add name="X-Robots-Tag" value="noai, noimageai" /> (no extra modules, applies to all responses). Use <remove> before <add> to prevent duplicate headers. robots.txt: place in site root directory; if 404, add MIME type <mimeMap fileExtension=".txt" mimeType="text/plain" /> under staticContent. Dynamic IP Restrictions (separate extension): denyByRequestRate maxRequests + denyConcurrentRequests for rate limiting. ARR reverse proxy: add bot blocking rule before the ARR proxy rule so blocked bots never hit the backend. GUI alternative: IIS Manager → URL Rewrite / HTTP Response Headers — changes write back to web.config.
Read guide →How to Block AI Bots on Varnish Cache: Complete 2026 Guide
Varnish is configured entirely through VCL (Varnish Configuration Language). Bot blocking goes in vcl_recv — the first subroutine for every request, runs before cache lookup and before any backend hit (zero backend load for blocked bots). UA matching: req.http.User-Agent ~ "(?i)(GPTBot|ClaudeBot|...)" then return(synth(403, "Forbidden")). The ~ operator does PCRE regex; (?i) = case-insensitive. vcl_synth handles the synthetic response: set Content-Type, call synthetic("Forbidden"), return(deliver). std.log() writes to VSL (varnishlog -g request -q). X-Robots-Tag: beresp.http.X-Robots-Tag in vcl_backend_response (cached with object) or resp.http.X-Robots-Tag in vcl_deliver (every delivery, not cached). robots.txt: detect req.url == "/robots.txt" in vcl_recv, return(synth(800, "robots")), build response in vcl_synth with synthetic() — use custom status 800 to avoid conflicting with real 200 responses. Rate limiting: vsthrottle VMOD (varnish-modules package) — vsthrottle.is_denied(req.http.X-Forwarded-For, 100, 10s) → synth(429). VCL ACL for IP whitelist: acl trusted_crawlers { "10.0.0.0"/8; } then if (client.ip ~ trusted_crawlers) { return(pass); }. Varnish open source has no TLS — put nginx/Caddy/HAProxy in front for HTTPS. Hot reload: varnishadm vcl.load + vcl.use (no restart needed).
Read guide →How to Block AI Bots on VitePress: Complete 2026 Guide
VitePress powers the docs for Vue, Vite, Rollup, Vitest, Pinia — one of the most-used documentation engines in the JS ecosystem. robots.txt: place in public/ directory (same level as .vitepress/) — VitePress copies public/ to .vitepress/dist/ on build; if docs are in docs/, put it at docs/public/robots.txt. noai meta tag: use transformHead hook in .vitepress/config.ts — runs at build time for every page, receives TransformContext with pageData.frontmatter; return [['meta', { name: 'robots', content: robots }]] with per-page default fallback. Simple alternative: global head array for same tag on every page (no per-page logic). Per-page override: add robots: 'index, follow' to YAML frontmatter — transformHead reads pageData.frontmatter.robots ?? 'noai, noimageai'. Frontmatter head array also works but merges (not replaces) global head entries — use transformHead for clean override. X-Robots-Tag: hosting layer — Netlify netlify.toml [[headers]]; Vercel vercel.json; Cloudflare Pages public/_headers (copied to dist). GitHub Pages: no custom headers, noai meta only. Hard 403: Netlify Edge Function; Cloudflare Pages functions/_middleware.ts. Full config.ts: transformHead with frontmatter override, themeConfig nav/sidebar.
Read guide →How to Block AI Bots on HAProxy: Complete 2026 Guide
HAProxy is ACL-driven — bot blocking is concise and fires before the backend. acl is_ai_bot req.hdr(User-Agent) -m sub -i GPTBot ClaudeBot... then http-request deny status 403 if is_ai_bot. Block in the frontend (not backend) — saves backend resources since the request never reaches the upstream. ACL flags: -m sub (substring, fast, preferred for bot names) vs -m reg (regex, slower); -i flag for case-insensitive. Multiple values on one acl line = OR logic; repeated acl lines with same name = OR too. X-Robots-Tag: http-response set-header in backend block (applies to proxied responses only) or frontend (applies to all including HAProxy error pages). robots.txt: serve directly from HAProxy with http-request return status 200 content-type text/plain file /etc/haproxy/robots.txt (HAProxy 2.4+) using path /robots.txt ACL — avoids hitting backend for every robots.txt request. Rate limiting: stick-table type ip store http_req_rate(10s), track-sc0 src table, deny when sc_http_req_rate(0) gt 100 — complements UA blocking for spoofed bots. Logging: http-request capture req.hdr(User-Agent) len 100 + set-log-level warning for blocked bots. Docker: haproxy:2.8-alpine, SIGUSR2 for graceful reload. Full config: SSL termination, HTTP→HTTPS redirect, rate limiting, bot blocking, robots.txt serving, stats page.
Read guide →How to Block AI Bots on Sphinx: Complete 2026 Guide
Sphinx generates static HTML documentation — used by CPython, Django, NumPy, and thousands of OSS projects. robots.txt: use html_extra_path in conf.py (html_extra_path = ['robots.txt']) — DO NOT place in _static/, which outputs to _build/html/_static/robots.txt (not root). noai meta tag: create source/_templates/layout.html with {% extends "!layout.html" %} (! prefix uses original theme layout — critical, prevents infinite recursion) and extrahead block with {{ super() }} call; register with templates_path = ['_templates'] in conf.py. Alternative: html_meta = {'robots': 'noai, noimageai'} in conf.py (works with alabaster/PyData Sphinx Theme; less reliable on sphinx_rtd_theme). Per-page: .. meta:: RST directive or myst front matter (myst.html_meta.robots). X-Robots-Tag: hosting layer — Netlify netlify.toml [[headers]]; Vercel vercel.json; Cloudflare Pages _headers via html_extra_path. Read the Docs: free tier has no custom headers — noai meta + robots.txt only; RTD Business adds header support. Hard 403: Netlify Edge Function; Cloudflare Pages functions/_middleware.ts (outside docs/ directory). Deployment table: RTD free/Business, Netlify, Vercel, Cloudflare Pages, GitHub Pages.
Read guide →How to Block AI Bots on Hexo: Complete 2026 Guide
Hexo generates a static site — no server process at runtime, so bot blocking splits across the content layer and the hosting platform. robots.txt: place in source/ directory — Hexo copies all source/ files to public/ on hexo generate, no config needed. Gotcha: if hexo-generator-robots is installed, it generates robots.txt from _config.yml — your manual source/robots.txt will conflict. Dynamic robots.txt: write a generator plugin in scripts/ (hexo.extend.generator.register — Hexo auto-loads all .js files in scripts/) to vary content by NODE_ENV. noai meta tag: edit theme layout partial (themes/[theme]/layout/_partial/head.ejs or head.njk) — add meta name='robots' content='<%= page.robots || "noai, noimageai" %>'. Per-page override via front matter (robots: index, follow, noai, noimageai). X-Robots-Tag: hosting layer only — Netlify: [[headers]] in netlify.toml; Vercel: headers() in vercel.json; Cloudflare Pages: source/_headers (copied to public/). GitHub Pages limitation: no custom headers, noai meta tag only. Hard 403: Netlify Edge Function in netlify/edge-functions/block-ai-bots.ts; Cloudflare Pages _middleware.ts in functions/ directory (outside source/). Deployment table: Netlify, Vercel, Cloudflare Pages, GitHub Pages, Firebase Hosting, AWS S3 + CloudFront.
Read guide →How to Block AI Bots on Directus: Complete 2026 Guide
Directus is a headless CMS — it serves JSON APIs and an admin panel, not your website's HTML. Bot blocking targets the API and admin routes directly. robots.txt in public/ overrides Directus's default. Layers 3–4 use a TypeScript hook extension that registers Fastify hooks via init('app.before'): onRequest checks the User-Agent and reply.code(403).send() blocks bots before routing; onSend injects X-Robots-Tag on all legitimate responses. EXEMPT_PATHS pass /robots.txt through. noai meta tags go in your frontend (Next.js/Nuxt/SvelteKit), not in Directus. nginx alternative covers self-hosted deployments with map $http_user_agent blocking before requests reach Node.js.
Read guide →How to Block AI Bots on Docusaurus: Complete 2026 Guide
Docusaurus (React-based, built by Meta) generates a static site served via any hosting platform. robots.txt: place robots.txt in the static/ directory — Docusaurus copies everything in static/ to the build/ root, no config needed. Dynamic robots.txt: create scripts/generate-robots.js and run it in a prebuild npm script. noai meta tag: cleanest approach is headTags in docusaurus.config.ts (no swizzling) — headTags array under themeConfig accepts raw HTML tag objects with tagName, attributes. Per-page override via front matter custom_edit_url or via Root swizzle: npx docusaurus swizzle @docusaurus/theme-classic Root --wrap creates src/theme/Root.tsx for conditional global logic. Per-page MDX: use <head> block directly in .md/.mdx files. X-Robots-Tag: platform-level — Netlify: [[headers]] in netlify.toml or _headers; Vercel: headers() in vercel.config; Cloudflare Pages: static/_headers (copied to build/). GitHub Pages limitation: no custom headers, noai meta only. Hard 403 blocking: Netlify Edge Function (netlify/edge-functions/block-ai-bots.ts); Cloudflare Pages _middleware.ts with cf.botManagement.score check. Full docusaurus.config.ts: headTags with noai/noimageai meta, organizationName, presets-classic, custom CSS. Deployment: Vercel, Netlify, Cloudflare Pages, GitHub Pages comparison.
Read guide →How to Block AI Bots on MkDocs: Complete 2026 Guide
MkDocs generates a static site — no server process, so bot blocking splits across the content layer (MkDocs itself) and the hosting platform layer. robots.txt: place in docs/ directory — MkDocs copies all docs/ files to site/ on build, no config needed. Alternatively use an on_post_build hook (Python) to generate robots.txt dynamically (staging vs prod env var). noai meta tag: custom_dir override in mkdocs.yml, create overrides/main.html extending base.html with {% block extrahead %} — always call {{ super() }} first when using MkDocs Material (preserves favicons, OG tags, theme color). Per-page override via front matter robots: key and Jinja conditionals. X-Robots-Tag: comes from host, not MkDocs — Netlify: [[headers]] in netlify.toml; Vercel: headers in vercel.json; Cloudflare Pages: _headers file in docs/ (copied to site/). GitHub Pages limitation: no custom HTTP headers possible — noai meta tag is the only option; migrate to Cloudflare Pages for hard blocking. Hard 403 blocking: Netlify Edge Function (netlify/edge-functions/block-ai-bots.ts with onRequest handler, path=/*); Cloudflare Pages _middleware.ts. Full mkdocs.yml: Material theme, custom_dir, hooks, plugins. File structure: hooks.py, netlify.toml, edge-functions/, overrides/main.html, docs/.
Read guide →How to Block AI Bots on Traefik: Complete 2026 Guide
Traefik has no built-in User-Agent blocking middleware — this is the key difference from nginx/Apache. Strategy: (1) Headers middleware for X-Robots-Tag via customResponseHeaders, (2) plugin middleware from Traefik Plugin Catalog for hard UA blocking, (3) application-layer middleware in your upstream (Next.js middleware.ts, Express) as the most reliable fallback. robots.txt: Traefik doesn't serve static files — serve from your upstream app (Next.js public/) or add an nginx sidecar with a Host+Path router rule. Provider suffix gotcha: middleware references must include @docker, @file, or @kubernetescrd — omitting the suffix causes 'middleware not found' errors at runtime. Static vs dynamic config: traefik.yml (entrypoints, providers, cert resolvers — restart required) vs dynamic.yml/Docker labels/K8s CRDs (routers, services, middlewares — live reload, no restart). Docker labels: traefik.http.middlewares.hdr.headers.customresponseheaders.X-Robots-Tag=noai, noimageai then traefik.http.routers.app.middlewares=hdr@docker. Kubernetes: Middleware CRD + IngressRoute CRD. Full static traefik.yml: entrypoints with HTTP→HTTPS redirect, HTTP/3, ACME Let's Encrypt, file provider with watch: true. Full docker-compose: external proxy network, no exposed ports on app container, dynamic.yml with HSTS + X-Robots-Tag.
Read guide →How to Block AI Bots on Caddy: Complete 2026 Guide
Caddy's named matchers make bot blocking concise: @bad_bot header_regexp bot User-Agent (?i)(GPTBot|ClaudeBot|...) then respond @bad_bot 403. Critical: Caddy evaluates directives by precedence, not top-to-bottom — respond fires before reverse_proxy/file_server regardless of order in Caddyfile (unlike nginx/Apache). header matcher glob (*GPTBot*) is case-sensitive; header_regexp with (?i) flag is case-insensitive. Multiple header User-Agent lines in a named matcher = OR conditions. header X-Robots-Tag adds to all responses including errors by default (no 'always' keyword needed, unlike nginx). robots.txt served automatically by file_server from root dir — no location block needed. Rate limiting not built in — requires caddy-ratelimit plugin (xcaddy build) or Cloudflare in front. Reverse proxy: respond @bad_bot fires before reverse_proxy due to precedence; handle /robots.txt { file_server } for local serving. header_down vs header: header modifies what Caddy sends to client, header_down inside reverse_proxy modifies upstream response. Admin API (localhost:2019): live config updates via JSON API, disable with { admin off } in production. Automatic HTTPS via Let's Encrypt (no SSL config). Docker: caddy:alpine with caddy_data volume for TLS cert persistence. Multi-stage Dockerfile for custom plugin builds (xcaddy). Full Caddyfile: global options, bad_bot matcher, X-Robots-Tag, file_server + reverse_proxy patterns, www redirect.
Read guide →How to Block AI Bots on Apache: Complete 2026 Guide
Apache has two clean paths for bot blocking: mod_rewrite (RewriteCond %{HTTP_USER_AGENT} with pipe-separated regex + [F,L] flags) and mod_setenvif (BrowserMatch sets bad_bot env var, then Require not env bad_bot in a LocationMatch). Critical: always use VirtualHost config, never .htaccess for this — Apache re-parses .htaccess on every request including the bots you're blocking, and requires AllowOverride All which is a security risk. Use .htaccess only on shared hosting. mod_rewrite flag syntax: [NC] case-insensitive, [OR] joins conditions with OR, [F] returns 403, [L] stops processing — the last RewriteCond in the chain has no [OR] (intentional). mod_setenvif + RequireAll + Require not env is the cleaner Apache 2.4 pattern. robots.txt: place in DocumentRoot (served automatically), add Location /robots.txt block with SetHandler default-handler and AllowOverride None. X-Robots-Tag: Header always set X-Robots-Tag "noai, noimageai" — always keyword required for 4xx/5xx responses, mod_headers must be enabled (a2enmod headers). noai meta: Apache doesn't inject HTML — add to base layout or PHP header.php. Rate limiting: mod_evasive (DOSPageCount, DOSSiteCount, DOSBlockingPeriod) catches UA-rotating bots. Reverse proxy: mod_proxy + mod_proxy_http, bot check fires before ProxyPass. Full VirtualHost with AllowOverride None (never All in production), SPA fallback, SSL. Docker: httpd:alpine. Apache 2.4 authz syntax (Require) vs legacy 2.2 (Order/Deny) covered.
Read guide →How to Block AI Bots on Nginx: Complete 2026 Guide
Nginx is the most powerful blocking layer — it sits in front of everything and rejects bots before they reach your app server. The map block (User-Agent → $bad_bot variable) must be in http {}, not server {} or location {} — putting it in the wrong context causes a config error. Hard 403 via if ($bad_bot) { return 403; } inside location / is safe for a pure return, despite nginx's "if is evil" reputation. robots.txt: exact-match location = /robots.txt with try_files, access_log off, and 1-day cache. X-Robots-Tag via add_header with the "always" keyword — without always, nginx skips the header on 4xx/5xx responses. Child location blocks with any add_header replace inherited ones (inheritance gotcha — repeat headers or use ngx_http_headers_more_module). Rate limiting: limit_req_zone in http {}, limit_req + limit_req_status 429 in location {}. noai meta tags: nginx doesn't inject HTML — add to base layout in your SSG or use X-Robots-Tag as the HTTP equivalent. Reverse proxy: bot check fires before proxy_pass, blocked bots never reach Node/Python/PHP origin. Full nginx.conf example: map block, rate limiting, robots.txt location, 403 blocking, reverse proxy, SSL, HTTP→HTTPS redirect. Docker: nginx:alpine with baked config or volume-mounted. Ubuntu/Debian: conf.d/bot-map.conf pattern for clean org.
Read guide →How to Block AI Bots on Strapi: Complete 2026 Guide
Strapi is headless — noai meta tags belong on your frontend (Next.js/Nuxt/Gatsby/Astro), not in Strapi. What Strapi controls: public/robots.txt (served via koa-static, no config needed), custom Koa middleware in src/middlewares/ for hard 403 blocking (registered in config/middlewares.ts), and X-Robots-Tag on responses. Critical: always exempt /admin, /api, /graphql, /_health, and /robots.txt in middleware — /admin breaks your CMS panel if blocked. For API scraping protection: Strapi Roles & Permissions (require auth on sensitive endpoints) is more robust than User-Agent blocking, since bots can fake UAs. Optional blockApi: true config flag to also block known AI bots from /api/*. X-Robots-Tag via after-next ctx.set() middleware (skip on /admin). nginx map block for network-level blocking before Node sees the request. Two-domain note: api.example.com and example.com each need their own robots.txt. Strapi v4 (CommonJS) vs v5 (ESM/CommonJS) middleware syntax covered.
Read guide →How to Block AI Bots on Payload CMS: Complete 2026 Guide
Payload v3 runs on Next.js App Router — the critical addition over a plain Next.js setup is the admin panel exemption. middleware.ts matcher must exclude /admin/:path*, /api/:path*, and /_next/:path* or you risk accidentally blocking your own admin session. robots.txt via public/ (static) or app/robots.ts Metadata API (env-aware); CMS-editable robots via a Payload Global (blockAiBots checkbox → query in robots.ts, no code deploy needed). noai meta via generateMetadata() other: { robots: 'noai, noimageai' } — global default in root layout, per-document override from Payload collection field. X-Robots-Tag in middleware.ts or next.config.mjs headers() with /admin|api exclusion pattern. withPayload() wrapper in next.config.mjs is required. Payload v2 (Express): app.use() middleware before payload handler with /admin + /api path guard. Deployment: Vercel (auto-detect), Railway, Render, Docker + nginx, Payload Cloud.
Read guide →How to Block AI Bots on SolidStart: Complete 2026 Guide
SolidStart runs on Vinxi with SSR by default. Bot blocking lives in src/middleware.ts via createMiddleware from @solidjs/start/middleware — one middleware file per app, handlers compose via the onRequest array. Key insight: SolidStart only supports a single middleware entry point, so all logic (bot blocking, auth, X-Robots-Tag) goes in one file. Compile User-Agent regex at module scope (not per-request), exempt /robots.txt before checking UA, return new Response('Forbidden', { status: 403 }) to short-circuit. public/robots.txt is copied verbatim by Vinxi — no config needed. noai meta tags via @solidjs/meta <Meta> are server-rendered in SSR mode (bots see them); in SPA mode (ssr: false) they're JS-only and invisible to crawlers. X-Robots-Tag via onBeforeResponse handler. SSG/prerender mode: middleware.ts still wires up but pages are static — add platform edge rules for hard blocking. Deployment presets: vercel, netlify, cloudflare-pages, node, bun, static.
Read guide →How to Block AI Bots on PocketBase: Complete 2026 Guide
PocketBase is a single Go binary — bot blocking hooks into its OnBeforeServe() event to register a global Echo middleware before the server starts accepting requests. The middleware runs before all PocketBase routes: check the User-Agent, return c.String(http.StatusForbidden, "Forbidden") to block, or call next(c) to continue. X-Robots-Tag set after next(c) returns. robots.txt requires zero code — place it in pb_public/ and PocketBase serves it automatically. Binary mode users (no Go source) must use nginx/Caddy in front of PocketBase for Layers 3–4. Covers SPA noai meta in index.html, custom Go html/template routes for server-rendered pages, and selective route group protection.
Read guide →How to Block AI Bots on Elixir Phoenix: Complete 2026 Guide
Phoenix uses the Plug pipeline — every request flows through a chain of Plug modules before reaching your controller. Plug.Static in endpoint.ex serves robots.txt from priv/static/ (the only: whitelist gotcha — if you restrict to specific extensions, add "robots.txt" or it's silently skipped). Custom Plug with init/1 + call/2 for hard 403 blocking: @blocked_ua_pattern module attribute compiles the regex at compile time (not per-request), halt() is mandatory after send_resp or the request continues through the pipeline. Place the blocking plug in endpoint.ex before the router for app-wide coverage. noai meta in root.html.heex with assigns[:robots_meta] override per-route. LiveView SSR renders the full HTML on first request — bots see all meta tags without JavaScript. X-Robots-Tag via a pipeline plug. Deployment: Fly.io with mix release, Gigalixir, Docker multi-stage with Elixir + OTP.
Read guide →How to Block AI Bots on Eleventy (11ty): Complete 2026 Guide
Eleventy produces a static site with no server process — two things are critical. First: addPassthroughCopy("robots.txt") in eleventy.config.js. Without this, robots.txt is silently dropped from _site/ on every build — the #1 Eleventy SEO mistake. If your input directory is "src", the path must be "src/robots.txt". Second: hard blocking requires a layer in front of the static output (Netlify Edge Function, Vercel middleware.js, Cloudflare Pages _middleware.ts, or nginx map block). noai meta tags go in your base layout (_includes/base.njk or .liquid or .webc) using front matter robots variable with a | default('noai, noimageai') filter — renders server-side so non-JS crawlers always see it. X-Robots-Tag set via netlify.toml [[headers]], vercel.json headers, Cloudflare _headers, or nginx add_header. Eleventy 2.x (module.exports in .eleventy.js) vs 3.x (export default in eleventy.config.js) config syntax covered for both.
Read guide →How to Block AI Bots on Cloudflare Workers: Complete 2026 Guide
Workers run at the edge — before your origin ever sees the request. Workers have no file system: serve robots.txt via Workers Static Assets ([assets] binding in wrangler.toml, served before your Worker runs) or embed it as a string constant in your fetch handler. The ES module syntax (export default { async fetch() }) is the current standard. Compile BLOCKED_UAS regex at module scope — not per-request. Workers KV lets you update blocked UA patterns without redeploying: module-level cache with 60-second TTL avoids a KV read on every request. Pages Functions _middleware.ts is the equivalent for Pages-hosted sites. Covers Hono app.use('*') middleware on Workers, KV dynamic rules, X-Robots-Tag via Response cloning (Workers Responses are immutable — clone before mutating headers), and comparison with Cloudflare's built-in AI Scrapers & Crawlers toggle.
Read guide →How to Block AI Bots on Hapi.js: Complete 2026 Guide
Hapi has no app.use() — bot blocking uses two server lifecycle extensions instead. onPreAuth fires before authentication and any route handler: check the User-Agent and return h.response('Forbidden').code(403).takeover() to block — .takeover() is required or Hapi continues executing the handler anyway. onPreResponse fires after the handler: call response.header('X-Robots-Tag', 'noai, noimageai') for all non-Boom responses. @hapi/inert provides a file handler for robots.txt. @hapi/vision provides h.view() for Handlebars/Nunjucks templates with per-page robots override. Plugin-scoped extensions apply only to routes registered within the same plugin.
Read guide →How to Block AI Bots on Koa.js: Complete 2026 Guide
Koa's onion-model middleware splits bot blocking across two phases: the upstream phase (before await next()) checks the User-Agent and returns a hard 403 for AI bots without calling next() — no route handler runs, zero database hits. The downstream phase (after await next() returns) injects X-Robots-Tag on every legitimate response. koa-static registered before the bot middleware serves /robots.txt as a static file before the bot check runs. ctx.state.robots flows into Nunjucks/EJS/Pug templates via koa-views. router.use('/prefix', middleware) applies selective blocking to API routes only.
Read guide →How to Block AI Bots on Hono: Complete 2026 Guide
Hono is runtime-agnostic — the same app.use('*') middleware runs unchanged on Cloudflare Workers, Bun, Deno, and Node.js. Compile the User-Agent regex at module scope (not per-request). robots.txt is handled explicitly: serveStatic from the runtime adapter for Node.js/Bun, a string constant GET /robots.txt route for Cloudflare Workers (no file system in isolates). X-Robots-Tag via a post-next middleware that calls c.res.headers.set() after await next(). JSX noai meta tags via /** @jsxImportSource hono/jsx */ with c.html(). Only the entrypoint differs between runtimes: export default app for Workers, serve(app) for Node.js, { fetch: app.fetch } for Bun, Deno.serve(app.fetch) for Deno.
Read guide →How to Block AI Bots on NestJS: Complete 2026 Guide
NestJS has a layered request pipeline: Middleware fires before Guards, which fire before Interceptors. The fastest bot block is app.use() in main.ts — no DI, just a raw Express middleware registered before NestFactory.listen(). For DI access (config, logging, database), use NestMiddleware with MiddlewareConsumer.apply().exclude().forRoutes('*'). Guards via CanActivate + APP_GUARD token let you block per-controller with @UseGuards() or globally via DI. Covers ServeStaticModule.forRoot() for robots.txt, dynamic GET /robots.txt controller with ConfigService, NestMiddleware class, CanActivate guard, NestInterceptor for X-Robots-Tag via rxjs tap(), Handlebars @Render() noai meta tag, APP_GUARD DI-aware global guard, nginx, and Docker multi-stage build.
Read guide →How to Block AI Bots on Fastify: Complete 2026 Guide
Fastify's lifecycle hook system differs from Express: addHook('onRequest') fires before routing and body parsing — the correct place for bot blocking. preHandler fires after body parsing, wasting cycles on a request you'll reject. The key encapsulation gotcha: a hook registered inside a plugin only applies to routes in that plugin's scope. Wrap your bot-blocking plugin with fp() (fastify-plugin) to break encapsulation and apply the hook globally. Covers @fastify/static for robots.txt, global onRequest hook with fp(), scoped preHandler for per-route blocking, onSend hook for X-Robots-Tag, @fastify/view noai meta tags, TypeScript + ESM setup, nginx map block, and Docker multi-stage build.
Read guide →How to Block AI Bots on Bun: Complete 2026 Guide
Bun.file() returns a BunFile (a Blob) you can pass directly to new Response() — serving robots.txt is a one-liner with no extra processing or Content-Type setting. For bun build --compile single-binary deployments, embed robots.txt as a string constant since Bun.file() cannot read files that don't exist at runtime. Hono middleware uses app.use('*') registered before routes; Elysia uses .onBeforeHandle() global hooks. Express runs on Bun unchanged — just replace node with bun. Covers Bun.serve() raw handler, Hono, Elysia, Express on Bun, bun build --compile with distroless Docker image, X-Robots-Tag, and nginx reverse proxy.
Read guide →How to Block AI Bots on Deno: Complete 2026 Guide
Deno has no auto-served static directory — robots.txt needs an explicit route in Deno.serve() or a file-based handler in Fresh (routes/robots.txt.ts). The key Deploy constraint: Deno Deploy isolates have no file system access, so embed robots.txt as a string constant. Oak middleware uses app.use() registered before app.use(router.routes()) — Oak processes middleware in order, so a bot-blocking use() registered first intercepts every request. Covers Deno.serve() raw handler, Oak app.use() middleware, Fresh routes/_middleware.ts (1.x and 2.x), embedded constant for Deno Deploy, Deno permissions scoping, X-Robots-Tag header, nginx reverse proxy, and Docker with denoland/deno image.
Read guide →How to Block AI Bots on Kotlin (Ktor): Complete 2026 Guide
Ktor is JetBrains' coroutine-native Kotlin web framework built around a pipeline of interceptable phases. Place robots.txt in src/main/resources/static/ and install the StaticContent plugin — Ktor serves it at /robots.txt with zero extra routing. For hard blocking, create a custom ApplicationPlugin with createApplicationPlugin() and intercept ApplicationCallPipeline.Plugins — the Plugins phase runs before routing resolves, making it the earliest reliable interception point. Covers staticResources vs staticFiles API difference (Ktor 2.3+), dynamic GET /robots.txt route, noai meta tags in Thymeleaf and FreeMarker templates, X-Robots-Tag header, nginx map block, and Docker deployment.
Read guide →How to Block AI Bots on Svelte: Complete 2026 Guide
Svelte compiled with Vite produces a pure SPA — no server, no middleware, no SSR. Vite copies public/ verbatim to dist/, so public/robots.txt just works. The key insight: unlike <svelte:head> (JS-only, invisible to non-JS crawlers), Vite's index.html at the project root is a real file you can edit — add a noai meta tag there and every crawler sees it before JavaScript runs. For hard 403 blocking on Cloudflare Pages, add a _worker.js to public/ and Cloudflare runs it as an edge Worker before serving static assets. Covers Netlify Edge Functions, Vercel Edge Middleware, nginx map block, X-Robots-Tag via hosting config, and a Svelte SPA vs SvelteKit comparison table.
Read guide →How to Block AI Bots on Rust (Actix-web & Axum): Complete 2026 Guide
Rust has no runtime overhead for the bot check — compile your regex once with LazyLock<Regex> (stable since Rust 1.80, no external crate needed) and it costs nothing per request. Use include_str!("../static/robots.txt") to bake robots.txt into the binary at compile time — no static/ directory at runtime, ideal for Docker scratch images and serverless. Actix-web uses wrap_fn closure middleware applied to App or a scope; Axum uses middleware::from_fn() with Tower's layer() — use layer() for app-wide middleware and route_layer() only when you need per-route-only application. Covers Cargo.toml for both frameworks, tower-http SetResponseHeaderLayer for X-Robots-Tag, nginx reverse proxy config, and an Actix-web vs Axum comparison table.
Read guide →How to Block AI Bots on PHP: Complete 2026 Guide
PHP powers ~78% of websites — including millions on shared hosting where server config is locked down. On shared hosts (cPanel, Bluehost, SiteGround), .htaccess mod_rewrite is the only server-level option before PHP runs. Covers static robots.txt, dynamic robots.php with env-based rules, front controller $_SERVER['HTTP_USER_AGENT'] + preg_match() check, .htaccess per-bot and condensed single-regex variants, noai meta tags in layout templates with per-page override, nginx + PHP-FPM map block, and hosting comparison (shared/VPS/managed/platform).
Read guide →How to Block AI Bots on Go (net/http & Gin): Complete 2026 Guide
Go has no built-in middleware framework — blocking wraps your http.Handler or uses Gin's r.Use(). Compile the bot pattern once at package level with regexp.MustCompile (not inside the handler). Use go:embed to bake robots.txt into the binary at compile time — no static/ directory needed at runtime, ideal for Docker and serverless. Covers net/http middleware wrapper, Gin middleware with c.AbortWithStatus(403), X-Robots-Tag header middleware, nginx reverse proxy config, and Cloud Run/AWS Lambda WAF notes.
Read guide →How to Block AI Bots on Angular: Complete 2026 Guide
Angular's index.html is a real editable file — unlike Vue.js, you can drop a noai meta tag there and it reaches every crawler without JavaScript. For robots.txt, configure the assets array in angular.json (Angular 15 and below) or use the public/ folder (Angular 17+). Covers Angular Meta service per-route control, Angular Universal Express middleware for hard 403 blocking, nginx static serving config, and Netlify/Vercel/Firebase Hosting headers setup.
Read guide →How to Block AI Bots on ASP.NET Core: Complete 2026 Guide
ASP.NET Core's UseStaticFiles() middleware serves robots.txt from wwwroot/ with zero config. For hard blocking, implement a custom IMiddleware class registered before UseRouting() in Program.cs. Covers dynamic Minimal API endpoint with environment-based rules, noai meta tags in _Layout.cshtml with @section Head override, web.config IIS URL rewrite rules for Azure App Service, and nginx reverse proxy config in front of Kestrel.
Read guide →How to Block AI Bots on Spring Boot: Complete 2026 Guide
Spring Boot's static resource handler serves robots.txt from src/main/resources/static/ with zero config. For hard blocking, choose HandlerInterceptor (MVC layer, no Spring Security needed) or OncePerRequestFilter (servlet layer, fits SecurityFilterChain). Includes Thymeleaf layout noai tags and nginx + Kubernetes ingress configs.
Read guide →How to Block AI Bots on Symfony: Complete 2026 Guide
Symfony's EventSubscriber system handles bot blocking with one PHP class and no extra packages. KernelEvents::REQUEST at priority 9999 fires before routing and controllers — setResponse(new Response('Forbidden', 403)) short-circuits the entire kernel. KernelEvents::RESPONSE injects X-Robots-Tag on every outgoing response. Covers robots.txt in public/, noai meta via Twig default filter, EXEMPT_PATHS for robots.txt/sitemap, service autoconfiguration, environment-specific disabling with when@dev, and PHPUnit unit tests for the subscriber.
Read guide →How to Block AI Bots on Vue.js: Complete 2026 Guide
Vue SPAs have a unique property: bots that don't run JavaScript see only an empty <div id="app"></div> shell — your content and useHead() noai tags are invisible. This guide covers robots.txt in Vite public/, noai via index.html (before JS), X-Robots-Tag via nginx/Express, hard nginx blocking, and Netlify/Vercel headers config.
Read guide →How to Block AI Bots on WooCommerce: Complete 2026 Guide
WooCommerce stores are high-value AI targets — the REST API at /wp-json/wc/v3/products exposes your entire product catalog as machine-readable JSON. Covers robots.txt for shop/product/cart pages, WooCommerce REST API blocking, noai meta tags via functions.php, .htaccess hard blocking (Apache), and Cloudflare WAF for managed hosts.
Read guide →How to Block AI Bots on Ruby on Rails: Complete 2026 Guide
Every new Rails app ships with public/robots.txt — most developers never edit it. Replace its contents in 30 seconds. Also covers a dynamic RobotsController with environment rules, before_action in ApplicationController, a Rack middleware class inserted before the Rails stack, noai tags in the application layout with content_for override, and nginx with Puma.
Read guide →How to Block AI Bots on Express.js: Complete 2026 Guide
Express middleware order is the key concept: express.static() → blockAiBots middleware → routes. Covers static public/robots.txt, dynamic GET /robots.txt route handler with env rules, global X-Robots-Tag header middleware, app.use() hard blocking (JS + TypeScript), nginx reverse proxy config, and Docker Compose setup. Includes the critical TypeScript void-return pattern.
Read guide →How to Block AI Bots on Django: Complete 2026 Guide
Django has a built-in DISALLOWED_USER_AGENTS setting processed by CommonMiddleware — add compiled regex patterns to settings.py and matched bots get 404 before any view runs. Also covers robots.txt view, custom 403 middleware, base.html noai tags with per-page block override, nginx reverse proxy config, and PaaS deployment notes (Heroku, Railway, Render).
Read guide →How to Block AI Bots on FastAPI: Complete 2026 Guide
FastAPI has no built-in robots.txt — add a GET /robots.txt route returning a PlainTextResponse with all AI bots listed. For hard blocking, a Starlette BaseHTTPMiddleware subclass intercepts requests before any route handler runs and returns 403. Also covers X-Robots-Tag response headers, Jinja2 template noai tags, and nginx reverse proxy config with uvicorn.
Read guide →How to Block AI Bots on Flask: Complete 2026 Guide
Flask is intentionally minimal — no built-in robots.txt, no user agent blocking. Add what you need: a @app.route('/robots.txt') for polite opt-out, a @app.before_request hook for hard 403 blocking before any view runs, or a Werkzeug WSGI middleware class wrapping app.wsgi_app for zero-framework-overhead blocking. Also covers X-Robots-Tag via after_request, Jinja2 template noai tags, and nginx + gunicorn config.
Read guide →How to Block AI Bots on Remix: Complete 2026 Guide
Remix's loader model lets you block AI bots in the root loader before any page renders — throw a 403 Response and nothing is generated. Also covers the robots[.]txt resource route (bracket-escaping explained), noai meta via root.tsx meta export, X-Robots-Tag via headers() export, and Cloudflare Workers edge blocking. Includes React Router v7 migration notes.
Read guide →How to Block AI Bots on Laravel: Complete 2026 Guide
Laravel gives you four independent layers: static public/robots.txt (zero config), a dynamic route with environment-based rules, a BlockAiBots middleware for hard 403 responses before any controller runs, and server-level .htaccess (Apache) or nginx blocks. Covers Laravel 10 Kernel.php and Laravel 11+ bootstrap/app.php. Includes Laravel Vapor serverless notes.
Read guide →How to Block AI Bots on BigCommerce: Complete 2026 Guide
BigCommerce is fully hosted SaaS — no server access. Block AI crawlers through the built-in admin panel robots.txt editor, Stencil theme robots.txt.html, noai meta tags in base.html, and Cloudflare WAF for hard blocking at the network edge. Includes a dedicated section on why e-commerce product catalogues are high-value AI scraping targets.
Read guide →How to Block AI Bots on SvelteKit: Complete 2026 Guide
SvelteKit's static/ directory, file-based +server.ts routing, and hooks.server.ts handle hook give you multiple layers — from polite robots.txt opt-out to hard 403 blocking before any route is processed. Covers adapter-static (SSG) and SSR modes.
Read guide →How to Block AI Bots on Nuxt.js: Complete 2026 Guide
Nuxt 3 has a dedicated @nuxtjs/robots module, routeRules for X-Robots-Tag headers, useHead() for per-page noai control, and Nitro server middleware for hard SSR blocking — everything you need without leaving Vue.
Read guide →How to Block AI Bots on Astro: Complete 2026 Guide
Astro's zero-JS output produces exactly what AI crawlers want. Block them with public/robots.txt (simplest), the native src/pages/robots.txt.ts endpoint, noai tags in BaseHead.astro, defineMiddleware for SSR, and public/_headers for Cloudflare Pages / Netlify.
Read guide →How to Block AI Bots on Gatsby: Complete 2026 Guide
Gatsby's pre-rendered HTML is ideal for AI crawlers. Block them with a plain static/robots.txt (no plugins needed), global noai tags via gatsby-ssr.js onRenderBody, Gatsby 5 Head export for per-page control, and static/_headers for Netlify/Cloudflare Pages edge blocking.
Read guide →How to Block AI Bots on Next.js: Complete 2026 Guide
Next.js 13+ App Router has native app/robots.ts support, metadata.robots for per-page noai tags, and edge middleware for hard blocking — the most complete AI bot control of any framework. Covers App Router, Pages Router, Vercel Edge Config, and next.config.js headers.
Read guide →How to Block AI Bots on Hugo: Complete 2026 Guide
Hugo's fast builds and clean HTML make it a frequent AI crawler target. Drop a plain robots.txt in static/ (copied directly to public/ — no template processing needed), add noai tags via baseof.html, and use static/_headers for Cloudflare Pages or netlify.toml for Netlify.
Read guide →How to Block AI Bots on Jekyll & GitHub Pages: Complete 2026 Guide
Jekyll's clean HTML and GitHub Pages' open accessibility make them AI crawler favourites. Block 25+ bots with a plain robots.txt in your project root, noai meta tags via your base layout, and Cloudflare WAF for custom domains — no server access needed.
Read guide →How to Block AI Bots on Magento / Adobe Commerce: Complete 2026 Guide
Magento stores are prime AI scraping targets — product descriptions, pricing, and reviews are exactly what Diffbot sells to AI labs. Block 25+ crawlers via the built-in Admin panel robots.txt editor, noai meta tags via layout XML, .htaccess, and Cloudflare or Fastly WAF.
Read guide →How to Block AI Bots on Joomla: Complete 2026 Guide
Joomla's default robots.txt doesn't block a single AI crawler — it was written before they existed. Here's how to fix that: edit the root robots.txt directly, add noai meta tags via the template editor, add .htaccess rules for Apache, and use Cloudflare WAF for edge blocking.
Read guide →How to Block AI Bots on Drupal: Complete 2026 Guide
Drupal powers government agencies, universities, and major publishers — all prime AI training targets. Four methods: edit the static robots.txt file, use the robotstxt module (no SSH needed), add noai meta tags via Metatag module, and block with .htaccess or Cloudflare WAF.
Read guide →How to Block AI Bots on Ghost: Complete 2026 Guide
Ghost produces clean semantic HTML — exactly what AI training pipelines prefer. Block GPTBot, CCBot, and Diffbot with Code Injection (all plans), self-hosted theme robots.txt or nginx alias, and Cloudflare WAF for Ghost(Pro).
Read guide →How to Block AI Bots on Framer: Complete 2026 Guide
Framer has no built-in robots.txt editor — but you can still block AI crawlers with a noai meta tag via Custom Code (paid plans) and Cloudflare WAF. Includes a Cloudflare Worker script for a proper custom robots.txt.
Read guide →How to Block AI Bots on Webflow: Complete 2026 Guide
Webflow gives you two clean methods: noai meta tags via Site Settings → Custom Code (all plans, free included), and a built-in robots.txt editor (paid plans). Full instructions plus Cloudflare WAF for bots that ignore robots.txt.
Read guide →How to Block AI Bots on Wix: Complete 2026 Guide
Wix has a built-in robots.txt editor most users don't know about. Step-by-step guide for the Crawlers & Indexing panel, noai meta tags via Custom Code, and Cloudflare WAF for free plan users and robots.txt violators.
Read guide →How to Block AI Bots on Squarespace: Complete 2026 Guide
Squarespace limits what you can do — but there are still options. Crawlers panel (Business plan+), noai tags via Code Injection, and Cloudflare WAF for all plans including Personal. Full plan comparison included.
Read guide →How to Block AI Bots on Shopify: Complete 2026 Guide
Shopify's default robots.txt blocks nothing useful — AI crawlers have free run of your products and blog. Here's how to edit robots.txt.liquid, add noai meta tags to theme.liquid, and use Cloudflare WAF to stop even robots.txt violators.
Read guide →How to Block AI Bots on WordPress: Complete 2026 Guide
WordPress generates a virtual robots.txt that allows all AI crawlers by default. Here's every method — Yoast/AIOSEO robots.txt editing, physical file upload, .htaccess server blocking, and Cloudflare WAF rules — for all 30+ AI bots.
Read guide →Generative Engine Optimization (GEO): The 2026 Playbook
GEO is SEO for AI search. Get cited by ChatGPT, Perplexity, and Google AI Overviews — structured data strategy, llms.txt, content signals, and a complete audit checklist.
Read guide →Is AI Using My Website Content? How to Find Out (and Stop It)
Almost certainly yes — if you haven't blocked AI bots. Check your server logs, see which bots have visited, and stop future crawls in under 10 minutes.
Read guide →How to Opt Out of AI Training: Every Method, Ranked (2026)
Six ways to stop AI companies training on your content — robots.txt, meta tags, TDMRep, llms.txt, company opt-out forms, and server blocking. What works, what doesn't, and what to do first.
Read guide →robots.txt for AI Bots: The Complete 2026 Guide
Control GPTBot, ClaudeBot, PerplexityBot, Bytespider, and 51+ crawlers. Ready-to-use configs, per-bot reference table, and the 5 mistakes that break your SEO.
Read guide →noai & noimageai: Block AI Training with Meta Tags
Opt out of AI training on a per-page basis without touching robots.txt. HTML meta tag and X-Robots-Tag examples for every server stack, plus CMS quick guides.
Read guide →llms.txt: The Complete Guide for 2026
The emerging standard that tells AI assistants exactly what your site is about. Full spec, copy-paste templates for Next.js, static sites, and WordPress, plus AI adoption table.
Read guide →How AI Search Engines Decide What to Surface (2026)
What PerplexityBot, OAI-SearchBot, Google-Extended, and Claude look for when choosing which pages to feature in AI answers. The 7 signals that matter, with a full optimisation checklist.
Read guide →Blocking Bytespider: Why robots.txt Isn't Enough
ByteDance's Bytespider crawler has been documented ignoring robots.txt. Here's how to block it at the server level: nginx, Cloudflare WAF, Vercel, Apache, and Next.js middleware.
Read guide →AI Readiness Score: What It Measures and How to Improve It
A breakdown of all 6 scoring categories, every check, grade thresholds, and the fastest path from a D to an A — based on the actual Open Shadow scanner methodology.
Read guide →How to Block Google-Extended: Stop Gemini AI Training
Google-Extended is Google's dedicated AI training crawler for Gemini and Bard. Block it in robots.txt without touching your Search rankings — with verification steps and Next.js config.
Read guide →How to Block GPTBot: Stop OpenAI Training on Your Site
GPTBot is OpenAI's training crawler for GPT-4 and beyond. Block it in robots.txt in 60 seconds — plus the critical difference between GPTBot, ChatGPT-User, and OAI-SearchBot.
Read guide →How to Block ClaudeBot: Stop Anthropic Training on Your Site
ClaudeBot is Anthropic's training crawler for Claude models. Block both ClaudeBot and anthropic-ai tokens — plus how to request removal of already-crawled content.
Read guide →How to Block PerplexityBot: Scraping Controversy Explained
PerplexityBot was at the centre of a 2024 robots.txt controversy. Block both PerplexityBot and perplexity-user — and understand the visibility tradeoff before you do.
Read guide →How to Block meta-externalagent: Stop Meta Training Llama on Your Site
Meta runs two crawlers — one for link previews, one for AI training. Most guides only cover the preview bot. Here's how to block the one that trains Llama without breaking your Facebook shares.
Read guide →How to Block CCBot: One Rule That Stops 50+ AI Models
Common Crawl's CCBot feeds training data to GPT, Gemini, Llama, Mistral, Falcon, and most open-source LLMs. Blocking it is the highest-leverage AI training opt-out you can make. One robots.txt line.
Read guide →Bingbot & Microsoft Copilot: Control What Copilot Knows About Your Site
Copilot draws from Bing's index — and so does ChatGPT Search. Blocking Bingbot removes you from both, but also kills your Bing Search traffic. Here's the full pipeline and the right call for your situation.
Read guide →How to Block ChatGPT-User: Stop Real-Time Browsing on Your Site
ChatGPT-User isn't a crawler — it fires when a user explicitly asks ChatGPT to read a URL. Blocking it stops on-demand page reads. Essential for paywalled publishers. Zero effect on training or search indexing.
Read guide →How to Block OAI-SearchBot: Control Your ChatGPT Search Presence
OAI-SearchBot indexes your site for ChatGPT Search — it's NOT the training crawler. Blocking it removes you from ChatGPT Search results. Here's the three-bot breakdown and when each block makes sense.
Read guide →How to Block Applebot-Extended: Stop Apple Intelligence Training
Applebot-Extended is Apple's AI training crawler — separate from the Applebot that powers Siri and Spotlight. Block AI training without losing your Spotlight or App Store presence.
Read guide →How to Block Diffbot: The AI Data Broker Feeding Llama & Mistral
Diffbot isn't a search engine — it's a commercial data broker that crawls your site and sells structured content to AI companies. One block cuts supply to Meta Llama, Mistral, DiffbotLLM, and more.
Read guide →How to Block xAI-Bot: Stop Grok from Training on Your Site
xAI-Bot is Elon Musk's crawler for training Grok — embedded inside X (Twitter). It actively targets news and real-time content. Here's how to opt out and what the X/Twitter data pipeline means for publishers.
Read guide →How to Block MistralBot: Stop Europe's Leading AI Lab from Training on Your Site
MistralBot is Mistral AI's training crawler — the French lab behind Mistral Large, Mixtral, and Le Chat. GDPR and the EU AI Act give publishers extra leverage here. Plus: why blocking CCBot too is the full fix.
Read guide →How to Block DeepSeekBot: Stop DeepSeek from Training on Your Site
DeepSeekBot crawls your site for DeepSeek's frontier models — V3, R1, and beyond. The crawler that stunned the AI world in 2025. What makes it different: Chinese jurisdiction, outside GDPR and US AI regulation. Here's the full opt-out.
Read guide →How to Block Amazonbot: Amazon's 3-Crawler Ecosystem Explained
Amazon runs three distinct bots: Amazonbot (AI training), Amzn-SearchBot (Rufus AI + Alexa), and Amzn-User (live queries). Most guides miss the difference. Blocking the wrong one kills your Rufus visibility. Here's the full breakdown.
Read guide →How to Block YouBot: You.com's AI Search Crawler
YouBot indexes your site for You.com's AI assistant answers — it's a search crawler, not a training crawler. Blocking it removes you from You.com results. Here's the tradeoff, the decision matrix, and the robots.txt config.
Read guide →How to Block AI2Bot: Allen Institute's Two AI Crawlers Explained
The Allen Institute for AI runs two separate web crawlers: AI2Bot (academic research + Semantic Scholar) and Ai2Bot-Dolma (the open-source training dataset powering OLMo). Different purposes — different blocking decisions. Here's the breakdown.
Read guide →How to Block cohere-ai: Cohere's Undocumented Web Crawler
cohere-ai crawls publisher sites without official documentation explaining what it collects. Operated by Cohere — the enterprise AI lab behind Command R and Embed. Only ~13% of major sites block it. Here's the robots.txt config and the full story.
Read guide →How to Block DuckAssistBot: DuckDuckGo's AI Answer Crawler
The privacy-first search engine deployed its own AI crawler. DuckAssistBot powers DuckDuckGo AI summaries and Duck.ai — and it's separate from their search indexer. Block AI answers without losing DuckDuckGo search rankings.
Read guide →How to Block Gemini-Deep-Research: Google's AI Research Crawler
Gemini-Deep-Research reads your entire site to compile AI reports for Gemini Advanced users — it's not the training crawler. Here's the full Google AI bot ecosystem, how Gemini-Deep-Research differs from Google-Extended, and how to block it without killing your SEO.
Read guide →How to Block Google-NotebookLM: Google's Viral AI Notebook Crawler
NotebookLM went viral by turning any URL into an AI podcast. The crawler behind it reads your pages when users add your site as a source — turning your content into AI audio without a click to your site. Here's how to block it.
Read guide →How to Block Webz.io & Omgili: The AI Data Broker Behind Three Crawlers
Webz.io operates under three identities — omgili, omgilibot, and webzio-extended — selling your web content to AI companies. One Disallow rule isn't enough. Here's how to block all three and why only webzio-extended needs blocking to stop AI training.
Read guide →AI Content Protection Tools Compared: Free & Paid (2026)
From a one-line robots.txt edit to enterprise bot management. Honest comparison of every tool available — free and paid — with a decision framework based on your actual risk level. No upsell, just what works.
Read guide →How to Monitor AI Bot Traffic on Your Site
Most site owners have zero visibility into AI bot traffic — Google Analytics doesn't show it. Learn 5 methods: server log analysis, dedicated bot log files, Next.js middleware, Cloudflare analytics, and real-time monitoring with alerts.
Read guide →How to Block AI Agents: When robots.txt Isn't Enough
AI agents don't crawl — they browse. Firecrawl, browser-use, Playwright MCP, and Stagehand bypass robots.txt entirely. Five defence layers that actually work: headless detection, behavioural analysis, honeypots, rate limiting, and TLS fingerprinting.
Read guide →TDMRep: The W3C Protocol That Gives Your AI Opt-Out Legal Teeth
robots.txt is a gentleman's agreement. TDMRep is backed by EU law. The W3C's Text and Data Mining Reservation Protocol lets you formally reserve rights over your content — with legal enforcement under the EU AI Act and CDSM Directive.
Read guide →How to Block AI Bots on Actix-web (Rust): Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Actix-web — wrap_fn middleware for global 403 blocking, Scope::wrap() for per-group control, actix_files::Files for robots.txt, and X-Robots-Tag on all responses.
Read guide →How to Block AI Bots on Axum (Rust): Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Axum — from_fn middleware for global 403 blocking, route_layer vs layer distinction, tower_http ServeDir for robots.txt, and X-Robots-Tag on all responses.
Read guide →How to Block AI Bots on Warp (Rust): Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Warp — filter combinators for UA-based rejection, recover() for 403 responses, warp::fs::dir for robots.txt, and with_header for X-Robots-Tag.
Read guide →How to Block AI Bots in Rust Salvo
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Rust Salvo using #[handler] middleware — req.headers().get() returns Option<&HeaderValue>, ctrl.skip_rest() stops the chain. Covers .hoop() registration and nested router scoping.
Read guide →How to Block AI Bots on Gin (Go): Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and AI crawlers on Gin — robots.txt via r.StaticFile(), X-Robots-Tag and hard 403 via gin.HandlerFunc middleware with c.AbortWithStatus(). Covers global r.Use() and route-group blocking.
Read guide →How to Block AI Bots on Echo (Go): Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and AI crawlers on Echo — robots.txt via e.File(), X-Robots-Tag and hard 403 via echo.MiddlewareFunc wrapper with next(c) pass-through. Covers global e.Use() and route-group blocking. Works with Echo v4 and v5.
Read guide →How to Block AI Bots in Go Gorilla Mux
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Gorilla Mux using the func(http.Handler) http.Handler middleware pattern. Covers r.Header.Get(), http.Error() short-circuit, next.ServeHTTP(), router.Use(), and subrouter scoping.
Read guide →How to Block AI Bots in Go Buffalo
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Go Buffalo using middleware. Covers buffalo.Handler, app.Use(), c.Render(403), and X-Robots-Tag injection.
Read guide →How to Block AI Bots in Go Iris
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Go Iris v12 using app.Use() middleware — ctx.GetHeader() returns empty string (not nil) when absent, ctx.StopWithText() to block, ctx.Next() to pass. Covers UseGlobal vs Use and party-scoped protection.
Read guide →How to Block AI Bots in Python BlackSheep
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in BlackSheep using the @app.middleware decorator and bytes-based header access. Covers get_first_header() and response injection.
Read guide →How to Block AI Bots in Python Quart
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Quart using async before_request hooks and after_request response injection. Covers async abort(), make_response(), ASGI deployment with Hypercorn, and Trio vs asyncio backends.
Read guide →How to Block AI Bots in Python Robyn
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Python Robyn using @app.before_request() middleware — headers are lowercase, return Response to block, return Request to pass. Covers after_request header injection and SubRouter scoping.
Read guide →How to Block AI Bots in Java Dropwizard
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Dropwizard using a Jersey ContainerRequestFilter with @PreMatching and abortWith(). Covers getHeaderString(), UriInfo path checks, @Provider registration, and Jetty servlet filter alternative.
Read guide →How to Block AI Bots in Javalin
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Javalin using before-handlers and ForbiddenResponse. Covers ctx.header() access, lowercase matching, route-group scoping, after-handler X-Robots-Tag injection, and static robots.txt serving.
Read guide →How to Block AI Bots in Java Helidon SE
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Java Helidon SE using a routing filter — any() handler, req.headers().first(), res.send() to block, res.next() to pass. Covers Helidon 4.x virtual threads and @PreMatching ContainerRequestFilter for Helidon MP.
Read guide →How to Block AI Bots in SparkJava
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in SparkJava using before() filters and halt(403). request.headers() is case-insensitive (Jetty). halt() throws HaltException — code after it is unreachable. Covers path-scoped filters.
Read guide →How to Block AI Bots in Kotlin Http4k
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Kotlin Http4k using a Filter — a (HttpHandler) -> HttpHandler function. Covers req.header(), Response(FORBIDDEN).header(), Filter.then(), and multi-backend server setup with SunHttp, Netty, and Undertow.
Read guide →How to Block AI Bots in Akka HTTP
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Akka HTTP (Scala). Use mapInnerRoute + optionalHeaderValueByName for a Directive0. Covers custom directives, complete(StatusCodes.Forbidden), and route composition.
Read guide →How to Block AI Bots on http4s (Scala): Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on http4s — Kleisli middleware for 403 short-circuit, HttpRoutes composition, StaticFile for robots.txt, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots in Scala Scalatra
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Scalatra using the before() filter and halt() short-circuit. Covers request.getHeader(), response.setHeader(), ScalatraFilter for embedded Jetty, and LifeCycle bootstrap registration.
Read guide →How to Block AI Bots on ZIO HTTP: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on ZIO HTTP — HandlerAspect middleware, request.header(Header.UserAgent) option access, Response.status(Status.Forbidden) short-circuit, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots in PHP Hyperf
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in PHP Hyperf using PSR-15 middleware — getHeaderLine() returns empty string when absent, return ResponseInterface to block, $handler->handle() to pass. Covers Swoole persistent-process gotchas.
Read guide →How to Block AI Bots in PHP Laminas
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Laminas MVC and Mezzio. Covers SharedEventManager with MvcEvent, getFieldValue() on Header objects, stopPropagation(), and PSR-15 middleware for Mezzio.
Read guide →How to Block AI Bots in PHP Phalcon
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in PHP Phalcon using before() handlers and PSR-15 middleware. Covers Micro app pipeline, strtolower(), return false short-circuit, and X-Robots-Tag injection.
Read guide →How to Block AI Bots in PHP ReactPHP
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in PHP ReactPHP. Middleware returns a PromiseInterface — use Promise::resolve(new Response(403)) to block. getHeaderLine() returns empty string when absent.
Read guide →How to Block AI Bots in Ruby Grape
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Grape using the before block and error!() short-circuit. Covers headers[] access, header() response setter, Rack middleware mounting, Rails integration via mount, and namespace scoping.
Read guide →How to Block AI Bots in Ruby Hanami
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Ruby Hanami using Rack middleware. Covers middleware registration, per-action guards, and X-Robots-Tag injection.
Read guide →How to Block AI Bots in Ruby Roda
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Roda using the hooks plugin before block, request.halt with a Rack response array, and Rack env HTTP_USER_AGENT. Covers plugin :hooks, routing tree short-circuit, and Rack middleware alternative.
Read guide →How to Block AI Bots in Perl Dancer2
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Dancer2 using the before hook and send_error(). Covers request->header() access, lc() matching, send_error short-circuit, after hook X-Robots-Tag injection, and PSGI deployment.
Read guide →How to Block AI Bots in Perl Mojolicious
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Mojolicious using the before_dispatch hook, under route guards, and Plack/PSGI middleware. Covers $c->req->headers->user_agent, lc() matching, render() short-circuit, and X-Robots-Tag injection.
Read guide →How to Block AI Bots on Elixir Plug: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Elixir Plug — custom Plug module with halt() for pipeline short-circuit, Plug.Builder for global pipelines, Plug.Router for per-scope blocking, Plug.Static for robots.txt.
Read guide →How to Block AI Bots on Erlang Cowboy: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Erlang Cowboy — cowboy_middleware behaviour with execute/2, binary:match for UA substring check, {stop, Req} to halt, cowboy_static for robots.txt.
Read guide →How to Block AI Bots in Node.js Polka
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Polka using Express-compatible (req, res, next) middleware with raw Node.js ServerResponse. Covers req.headers[], res.writeHead(), res.end(), .use() registration, and sub-application scoping.
Read guide →How to Block AI Bots in Node.js Restify
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Node.js Restify using server.use() middleware — req.headers lowercase, res.send(403) to block, next() to pass. Covers server.pre() vs server.use(), route-level inline middleware.
Read guide →How to Block AI Bots in C++ Drogon
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Drogon using Filter classes and the doFilter() callback chain. Covers req->getHeader(), fccb(resp) short-circuit, fccb(nullptr) pass-through, coroutine filters, and config.json registration.
Read guide →How to Block AI Bots on Haskell WAI + Warp: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Haskell WAI — Middleware type (Application -> Application), CI ByteString case-insensitive headers, responseLBS 403 short-circuit, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots on Clojure Ring: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Clojure Ring — wrap-bot-blocker middleware for 403 short-circuit, Compojure and Reitit integration, ring.middleware.resource for robots.txt, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots on Crystal Kemal: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Crystal Kemal — before_all filter with halt() for 403 short-circuit, static_folder for robots.txt, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots on Dart Shelf: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Dart Shelf — Middleware typedef with Pipeline composition, Response.forbidden() short-circuit, shelf_static for robots.txt, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots on .NET Minimal API: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on .NET Minimal API — app.Use middleware for global 403, IEndpointFilter for per-endpoint blocking, MapGroup for scoped protection, UseStaticFiles for robots.txt.
Read guide →How to Block AI Bots on F# Giraffe: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on F# Giraffe — HttpHandler composition with >=> fish operator, earlyReturn short-circuit, setStatusCode 403, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots in Julia Genie.jl
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Julia Genie.jl using HTTP.jl middleware. Covers occursin() pattern matching, HTTP.Response(403), and X-Robots-Tag injection.
Read guide →How to Block AI Bots on Gleam + Wisp: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Gleam with Wisp — middleware via the use keyword, Result-safe header access, path routing for robots.txt, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots in Common Lisp Hunchentoot
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Hunchentoot using *before-request-hook*, abort-request-handler, and a custom acceptor class. Covers header-in*, header-out setf, return-code*, and CLOS method dispatch.
Read guide →How to Block AI Bots on Kong Gateway: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on Kong Gateway — custom Lua plugin for 403 blocking, declarative YAML config, global vs per-service/route scope, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots in Lua Lapis
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Lapis using before_filter and table-return short-circuit. Covers self.req.headers access, respond_to scoping, MoonScript variant, and OpenResty nginx.conf integration.
Read guide →How to Block AI Bots on OpenResty (Nginx + Lua): Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on OpenResty — access_by_lua_block for 403 at the access phase, header_filter_by_lua_block for X-Robots-Tag, init_by_lua_block for shared bot patterns.
Read guide →How to Block AI Bots in Nim Jester
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Nim Jester using before/after hooks. Covers halt(), header injection, robots.txt bypass, and static file serving.
Read guide →How to Block AI Bots on OCaml Dream: Complete 2026 Guide
Block GPTBot, ClaudeBot, CCBot, and 60+ AI crawlers on OCaml Dream — middleware function composition, Dream.header option safety, Dream.respond ~status:`Forbidden short-circuit, X-Robots-Tag on all responses.
Read guide →How to Block AI Bots in R Plumber
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in R Plumber using pr_filter(). Covers Rook-style header access, grepl() pattern matching, filter short-circuit without forward(), and X-Robots-Tag injection.
Read guide →How to Block AI Bots in Zig Zap
Block AI crawlers like GPTBot, ClaudeBot, and CCBot in Zig Zap using comptime pattern arrays and a middleware-style handler wrapper. Covers std.ascii.lowerString, setStatus(.forbidden), and X-Robots-Tag injection.
Read guide →