How to Block AI Bots on Actix-web (Rust): Complete 2026 Guide
Actix-web is Rust's most-starred web framework — built on Tokio, known for raw throughput. Its middleware model uses wrap_fn() for simple closures and the Transform + Service trait pair for full lifecycle control. Both can short-circuit with a 403 response before the route handler ever runs — unlike Rocket fairings.
Middleware CAN abort requests
Unlike Rocket fairings, Actix-web middleware can terminate a request before the handler runs. In wrap_fn, returning early from the closure without calling next.call(req) short-circuits the entire handler chain. The inner service — database queries, template rendering, everything — never executes. Zero wasted computation for blocked bots.
Protection layers
Dependencies (Cargo.toml)
# Cargo.toml — required dependencies
[dependencies]
actix-web = "4"
actix-files = "0.6" # for static file serving (robots.txt)
tokio = { version = "1", features = ["full"] }
serde_json = "1" # optional — for JSON responses
futures-util = "0.3" # for LocalBoxFuture in full middlewareStep 1 — Shared bot list (src/bots.rs)
A &[&str] slice is zero-cost — strings are embedded in the binary. is_ai_bot() lowercases before checking so GPTBot and gptbot both match.
// src/bots.rs — shared AI bot list
pub const AI_BOTS: &[&str] = &[
// OpenAI
"gptbot", "chatgpt-user", "oai-searchbot",
// Anthropic
"claudebot", "claude-web",
// Common Crawl
"ccbot",
// Bytedance
"bytespider",
// Meta
"meta-externalagent",
// Perplexity
"perplexitybot",
// Google AI
"google-extended", "googleother",
// Cohere
"cohere-ai",
// Amazon
"amazonbot",
// Diffbot
"diffbot",
// AI2
"ai2bot",
// DeepSeek
"deepseekbot",
// Mistral
"mistralai-user",
// xAI
"xai-bot",
// You.com
"youbot",
// DuckDuckGo AI
"duckassistbot",
];
pub fn is_ai_bot(user_agent: &str) -> bool {
let ua = user_agent.to_lowercase();
AI_BOTS.iter().any(|bot| ua.contains(bot))
}Step 2 — Global middleware via wrap_fn (recommended)
middleware::from_fn() wraps an async function or closure as middleware. Check the User-Agent early — if it's an AI bot, return the 403 immediately. The inner service never runs. For legitimate requests, call next.call(req).await and add the X-Robots-Tag header to the response.
// src/main.rs — global AI bot blocking with wrap_fn (simplest approach)
use actix_web::{web, App, HttpServer, HttpResponse, middleware};
use actix_web::dev::{ServiceRequest, ServiceResponse};
use actix_web::Error;
mod bots;
/// Simple middleware closure — checks UA, short-circuits with 403 for AI bots.
/// next.call(req) passes through to the inner service (the route handler).
async fn ai_bot_blocker(
req: ServiceRequest,
next: middleware::Next<impl actix_web::body::MessageBody>,
) -> Result<ServiceResponse<impl actix_web::body::MessageBody>, Error> {
let ua = req
.headers()
.get("user-agent")
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if bots::is_ai_bot(ua) {
// Short-circuit — handler never runs
let response = HttpResponse::Forbidden()
.insert_header(("x-robots-tag", "noai, noimageai"))
.body("Forbidden");
return Ok(req.into_response(response));
}
// Pass through — run the actual route handler
let mut res = next.call(req).await?;
// Add X-Robots-Tag to all non-blocked responses too
res.response_mut()
.headers_mut()
.insert(
actix_web::http::header::HeaderName::from_static("x-robots-tag"),
actix_web::http::header::HeaderValue::from_static("noai, noimageai"),
);
Ok(res)
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| {
App::new()
// Global — fires on every request before any handler
.wrap(middleware::from_fn(ai_bot_blocker))
.service(
web::scope("/api")
.route("/data", web::get().to(api_data))
)
.route("/", web::get().to(index))
.route("/health", web::get().to(health))
// Serve static files including robots.txt — mount last
.service(actix_files::Files::new("/", "./static"))
})
.bind("0.0.0.0:8080")?
.run()
.await
}
async fn index() -> impl actix_web::Responder {
"Welcome to my site"
}
async fn api_data() -> impl actix_web::Responder {
HttpResponse::Ok().json(serde_json::json!({ "data": "sensitive" }))
}
// Health check — still blocked by global middleware if AI bot
async fn health() -> impl actix_web::Responder {
"ok"
}Step 3 — Per-route-group blocking with Scope::wrap()
When you only want to protect specific routes (e.g., your API), use web::scope().wrap(). Public routes like /health and /robots.txt remain fully accessible to all crawlers.
// Per-route-group blocking — only /api/* routes are protected
// Public routes (/health, /, /robots.txt) are unaffected.
use actix_web::{web, App, HttpServer, middleware};
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| {
App::new()
// Public routes — no middleware
.route("/", web::get().to(index))
.route("/health", web::get().to(health))
// Protected API scope — AI bots get 403 on all /api/* routes
.service(
web::scope("/api")
.wrap(middleware::from_fn(ai_bot_blocker))
.route("/data", web::get().to(api_data))
.route("/users", web::get().to(users))
)
// Serve robots.txt from static dir — unprotected by design
.service(actix_files::Files::new("/", "./static"))
})
.bind("0.0.0.0:8080")?
.run()
.await
}Step 4 — Full Transform + Service middleware (advanced)
For dynamic block-lists, shared state (e.g., an Arc<RwLock<HashSet>> updated from Redis), or custom error types — implement Transform and Service directly. The new_transform() method runs once at startup;call() runs per-request.
// Full Transform + Service middleware — for dynamic block-lists or shared state
use actix_web::dev::{forward_ready, Service, ServiceRequest, ServiceResponse, Transform};
use actix_web::{Error, HttpResponse};
use futures_util::future::{ok, LocalBoxFuture, Ready};
use std::sync::Arc;
use crate::bots::is_ai_bot;
// The middleware factory — implements Transform
pub struct AiBotMiddleware;
impl<S, B> Transform<S, ServiceRequest> for AiBotMiddleware
where
S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
B: 'static,
{
type Response = ServiceResponse<actix_web::body::EitherBody<B>>;
type Error = Error;
type InitError = ();
type Transform = AiBotMiddlewareService<S>;
type Future = Ready<Result<Self::Transform, Self::InitError>>;
fn new_transform(&self, service: S) -> Self::Future {
ok(AiBotMiddlewareService { service: Arc::new(service) })
}
}
// The middleware service — implements Service
pub struct AiBotMiddlewareService<S> {
service: Arc<S>,
}
impl<S, B> Service<ServiceRequest> for AiBotMiddlewareService<S>
where
S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
S::Future: 'static,
B: 'static,
{
type Response = ServiceResponse<actix_web::body::EitherBody<B>>;
type Error = Error;
type Future = LocalBoxFuture<'static, Result<Self::Response, Self::Error>>;
forward_ready!(service);
fn call(&self, req: ServiceRequest) -> Self::Future {
let ua = req
.headers()
.get("user-agent")
.and_then(|v| v.to_str().ok())
.unwrap_or("")
.to_owned();
let svc = Arc::clone(&self.service);
Box::pin(async move {
if is_ai_bot(&ua) {
let (http_req, _payload) = req.into_parts();
let response = HttpResponse::Forbidden()
.insert_header(("x-robots-tag", "noai, noimageai"))
.body("Forbidden");
let res = ServiceResponse::new(http_req, response)
.map_into_left_body();
return Ok(res);
}
let res = svc.call(req).await?;
// Add X-Robots-Tag to all legitimate responses
let mut res = res.map_into_right_body();
res.response_mut()
.headers_mut()
.insert(
actix_web::http::header::HeaderName::from_static("x-robots-tag"),
actix_web::http::header::HeaderValue::from_static("noai, noimageai"),
);
Ok(res)
})
}
}
// Usage:
// App::new().wrap(AiBotMiddleware).service(...)Step 5 — robots.txt
Three approaches: static file via actix_files::Files, a dedicated route handler, or compile-time embedding with include_str!(). Mount Files::new() last — Actix-web matches routes in registration order, so handlers registered before Files take priority.
// Option A: Static file via actix-files
// Place robots.txt in ./static/robots.txt
// actix-files serves it automatically at GET /robots.txt
// Mount LAST — route handlers take priority over Files::new
//
// Cargo.toml: actix-files = "0.6"
use actix_files;
App::new()
.route("/", web::get().to(index))
// ... other routes ...
// Files::new("/", "./static") serves any file under ./static at /filename
.service(actix_files::Files::new("/", "./static"))
// Option B: Dedicated route handler (dynamic or compile-time embedded)
use actix_web::{get, HttpResponse, Responder};
const ROBOTS_TXT: &str = "User-agent: *
Allow: /
# AI training bots — blocked
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: YouBot
Disallow: /
User-agent: AmazonBot
Disallow: /
User-agent: Diffbot
Disallow: /";
#[get("/robots.txt")]
async fn robots_txt() -> impl Responder {
HttpResponse::Ok()
.content_type("text/plain; charset=utf-8")
.body(ROBOTS_TXT)
}
// Option C: Include from file at compile time
#[get("/robots.txt")]
async fn robots_embedded() -> impl Responder {
HttpResponse::Ok()
.content_type("text/plain; charset=utf-8")
.body(include_str!("../static/robots.txt"))
}Step 6 — noai meta tag in templates
Add <meta name="robots" content="noai, noimageai"> to every HTML page. With Askama (compile-time templates) or Tera/Handlebars (runtime), put it in your base template. Pair with the X-Robots-Tag header from middleware for belt-and-suspenders coverage.
// noai meta tag in Tera/Handlebars/Askama templates:
// <meta name="robots" content="noai, noimageai">
// With Askama (compile-time templates):
use askama::Template;
use actix_web::{get, HttpResponse};
#[derive(Template)]
#[template(source = r#"
<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noai, noimageai">
<title>{{ title }}</title>
</head>
<body>{{ content }}</body>
</html>
"#, ext = "html")]
struct PageTemplate {
title: String,
content: String,
}
#[get("/")]
async fn index() -> HttpResponse {
let tmpl = PageTemplate {
title: "My Site".to_string(),
content: "Hello".to_string(),
};
HttpResponse::Ok()
.content_type("text/html; charset=utf-8")
// X-Robots-Tag on the response header (belt + suspenders)
.insert_header(("x-robots-tag", "noai, noimageai"))
.body(tmpl.render().unwrap())
}Actix-web vs Axum vs Warp vs Rocket
| Feature | Actix-web | Axum | Warp | Rocket |
|---|---|---|---|---|
| Middleware model | wrap_fn closure or Transform+Service trait pair | Tower layers — Service<Request> → Service<Request> | Filter combinators — composable extractors + map/and_then | Request Guards (per-route FromRequest) + Fairings (lifecycle) |
| Can abort request? | Yes — return HttpResponse::Forbidden() before next.call() | Yes — return Response from layer before calling inner service | Yes — return Rejection; map with recover() | Guards: yes. Fairings: no (on_request returns ()). |
| Global blocking | App::wrap(middleware) — all routes | Router::layer(middleware) — all routes | .and(filter) composed at server level | Fairing on_response override (handler still runs) or add guard to every route |
| Per-group blocking | web::scope("/api").wrap(middleware) | Router::nest("/api", ...).layer(middleware) | path("api").and(filter).and(route) | No native scope middleware — use per-route guards |
| UA header access | req.headers().get("user-agent") | req.headers().get("user-agent") | warp::header::optional("user-agent") | req.headers().get_one("User-Agent") |
| Hard 403 | HttpResponse::Forbidden().finish() | (StatusCode::FORBIDDEN, "Forbidden").into_response() | warp::reject::custom(AiBotRejection) | Outcome::Error((Status::Forbidden, ())) |
| Static files / robots.txt | actix_files::Files::new("/", "./static") | tower_http::services::ServeDir | warp::fs::dir("./static") | FileServer::from("./static") or #[get] route |
| Async runtime | Tokio (via #[actix_web::main]) | Tokio (Tower ecosystem) | Tokio | Tokio (via #[launch]) |
Summary
- wrap_fn + App::wrap() — one function, global coverage, zero wasted computation. The right default.
- Scope::wrap() — protect only /api/* while keeping public routes open. Add it to any scope, not just App.
- Transform + Service — for dynamic block-lists or shared state. More boilerplate but full control.
- actix_files::Files — mount last to serve robots.txt without blocking legitimate crawlers.
- X-Robots-Tag — add to the response in middleware, after
next.call(), so it appears on all legitimate responses.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.