Skip to content
Guides/Actix-web

How to Block AI Bots on Actix-web (Rust): Complete 2026 Guide

Actix-web is Rust's most-starred web framework — built on Tokio, known for raw throughput. Its middleware model uses wrap_fn() for simple closures and the Transform + Service trait pair for full lifecycle control. Both can short-circuit with a 403 response before the route handler ever runs — unlike Rocket fairings.

Middleware CAN abort requests

Unlike Rocket fairings, Actix-web middleware can terminate a request before the handler runs. In wrap_fn, returning early from the closure without calling next.call(req) short-circuits the entire handler chain. The inner service — database queries, template rendering, everything — never executes. Zero wasted computation for blocked bots.

Protection layers

1
robots.txtactix_files::Files or a #[get] handler — served before middleware on the route it's registered at
2
noai meta tagIn HTML templates via Askama, Tera, Handlebars, or inline in handler responses
3
X-Robots-Tag headerAdded in middleware after next.call() — present on all responses including 403s
4
Hard 403 — global (App::wrap)wrap_fn on App — fires before any handler on every request. Zero-cost abort.
5
Hard 403 — scoped (Scope::wrap)wrap_fn on web::scope("/api") — only protects /api/* routes. Public routes unaffected.

Dependencies (Cargo.toml)

# Cargo.toml — required dependencies
[dependencies]
actix-web = "4"
actix-files = "0.6"     # for static file serving (robots.txt)
tokio = { version = "1", features = ["full"] }
serde_json = "1"         # optional — for JSON responses
futures-util = "0.3"    # for LocalBoxFuture in full middleware

Step 1 — Shared bot list (src/bots.rs)

A &[&str] slice is zero-cost — strings are embedded in the binary. is_ai_bot() lowercases before checking so GPTBot and gptbot both match.

// src/bots.rs — shared AI bot list
pub const AI_BOTS: &[&str] = &[
    // OpenAI
    "gptbot", "chatgpt-user", "oai-searchbot",
    // Anthropic
    "claudebot", "claude-web",
    // Common Crawl
    "ccbot",
    // Bytedance
    "bytespider",
    // Meta
    "meta-externalagent",
    // Perplexity
    "perplexitybot",
    // Google AI
    "google-extended", "googleother",
    // Cohere
    "cohere-ai",
    // Amazon
    "amazonbot",
    // Diffbot
    "diffbot",
    // AI2
    "ai2bot",
    // DeepSeek
    "deepseekbot",
    // Mistral
    "mistralai-user",
    // xAI
    "xai-bot",
    // You.com
    "youbot",
    // DuckDuckGo AI
    "duckassistbot",
];

pub fn is_ai_bot(user_agent: &str) -> bool {
    let ua = user_agent.to_lowercase();
    AI_BOTS.iter().any(|bot| ua.contains(bot))
}

Step 2 — Global middleware via wrap_fn (recommended)

middleware::from_fn() wraps an async function or closure as middleware. Check the User-Agent early — if it's an AI bot, return the 403 immediately. The inner service never runs. For legitimate requests, call next.call(req).await and add the X-Robots-Tag header to the response.

// src/main.rs — global AI bot blocking with wrap_fn (simplest approach)
use actix_web::{web, App, HttpServer, HttpResponse, middleware};
use actix_web::dev::{ServiceRequest, ServiceResponse};
use actix_web::Error;

mod bots;

/// Simple middleware closure — checks UA, short-circuits with 403 for AI bots.
/// next.call(req) passes through to the inner service (the route handler).
async fn ai_bot_blocker(
    req: ServiceRequest,
    next: middleware::Next<impl actix_web::body::MessageBody>,
) -> Result<ServiceResponse<impl actix_web::body::MessageBody>, Error> {
    let ua = req
        .headers()
        .get("user-agent")
        .and_then(|v| v.to_str().ok())
        .unwrap_or("");

    if bots::is_ai_bot(ua) {
        // Short-circuit — handler never runs
        let response = HttpResponse::Forbidden()
            .insert_header(("x-robots-tag", "noai, noimageai"))
            .body("Forbidden");
        return Ok(req.into_response(response));
    }

    // Pass through — run the actual route handler
    let mut res = next.call(req).await?;

    // Add X-Robots-Tag to all non-blocked responses too
    res.response_mut()
        .headers_mut()
        .insert(
            actix_web::http::header::HeaderName::from_static("x-robots-tag"),
            actix_web::http::header::HeaderValue::from_static("noai, noimageai"),
        );

    Ok(res)
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            // Global — fires on every request before any handler
            .wrap(middleware::from_fn(ai_bot_blocker))
            .service(
                web::scope("/api")
                    .route("/data", web::get().to(api_data))
            )
            .route("/", web::get().to(index))
            .route("/health", web::get().to(health))
            // Serve static files including robots.txt — mount last
            .service(actix_files::Files::new("/", "./static"))
    })
    .bind("0.0.0.0:8080")?
    .run()
    .await
}

async fn index() -> impl actix_web::Responder {
    "Welcome to my site"
}

async fn api_data() -> impl actix_web::Responder {
    HttpResponse::Ok().json(serde_json::json!({ "data": "sensitive" }))
}

// Health check — still blocked by global middleware if AI bot
async fn health() -> impl actix_web::Responder {
    "ok"
}

Step 3 — Per-route-group blocking with Scope::wrap()

When you only want to protect specific routes (e.g., your API), use web::scope().wrap(). Public routes like /health and /robots.txt remain fully accessible to all crawlers.

// Per-route-group blocking — only /api/* routes are protected
// Public routes (/health, /, /robots.txt) are unaffected.
use actix_web::{web, App, HttpServer, middleware};

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            // Public routes — no middleware
            .route("/", web::get().to(index))
            .route("/health", web::get().to(health))

            // Protected API scope — AI bots get 403 on all /api/* routes
            .service(
                web::scope("/api")
                    .wrap(middleware::from_fn(ai_bot_blocker))
                    .route("/data", web::get().to(api_data))
                    .route("/users", web::get().to(users))
            )

            // Serve robots.txt from static dir — unprotected by design
            .service(actix_files::Files::new("/", "./static"))
    })
    .bind("0.0.0.0:8080")?
    .run()
    .await
}

Step 4 — Full Transform + Service middleware (advanced)

For dynamic block-lists, shared state (e.g., an Arc<RwLock<HashSet>> updated from Redis), or custom error types — implement Transform and Service directly. The new_transform() method runs once at startup;call() runs per-request.

// Full Transform + Service middleware — for dynamic block-lists or shared state
use actix_web::dev::{forward_ready, Service, ServiceRequest, ServiceResponse, Transform};
use actix_web::{Error, HttpResponse};
use futures_util::future::{ok, LocalBoxFuture, Ready};
use std::sync::Arc;

use crate::bots::is_ai_bot;

// The middleware factory — implements Transform
pub struct AiBotMiddleware;

impl<S, B> Transform<S, ServiceRequest> for AiBotMiddleware
where
    S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
    S::Future: 'static,
    B: 'static,
{
    type Response = ServiceResponse<actix_web::body::EitherBody<B>>;
    type Error = Error;
    type InitError = ();
    type Transform = AiBotMiddlewareService<S>;
    type Future = Ready<Result<Self::Transform, Self::InitError>>;

    fn new_transform(&self, service: S) -> Self::Future {
        ok(AiBotMiddlewareService { service: Arc::new(service) })
    }
}

// The middleware service — implements Service
pub struct AiBotMiddlewareService<S> {
    service: Arc<S>,
}

impl<S, B> Service<ServiceRequest> for AiBotMiddlewareService<S>
where
    S: Service<ServiceRequest, Response = ServiceResponse<B>, Error = Error>,
    S::Future: 'static,
    B: 'static,
{
    type Response = ServiceResponse<actix_web::body::EitherBody<B>>;
    type Error = Error;
    type Future = LocalBoxFuture<'static, Result<Self::Response, Self::Error>>;

    forward_ready!(service);

    fn call(&self, req: ServiceRequest) -> Self::Future {
        let ua = req
            .headers()
            .get("user-agent")
            .and_then(|v| v.to_str().ok())
            .unwrap_or("")
            .to_owned();

        let svc = Arc::clone(&self.service);

        Box::pin(async move {
            if is_ai_bot(&ua) {
                let (http_req, _payload) = req.into_parts();
                let response = HttpResponse::Forbidden()
                    .insert_header(("x-robots-tag", "noai, noimageai"))
                    .body("Forbidden");
                let res = ServiceResponse::new(http_req, response)
                    .map_into_left_body();
                return Ok(res);
            }

            let res = svc.call(req).await?;

            // Add X-Robots-Tag to all legitimate responses
            let mut res = res.map_into_right_body();
            res.response_mut()
                .headers_mut()
                .insert(
                    actix_web::http::header::HeaderName::from_static("x-robots-tag"),
                    actix_web::http::header::HeaderValue::from_static("noai, noimageai"),
                );
            Ok(res)
        })
    }
}

// Usage:
// App::new().wrap(AiBotMiddleware).service(...)

Step 5 — robots.txt

Three approaches: static file via actix_files::Files, a dedicated route handler, or compile-time embedding with include_str!(). Mount Files::new() last — Actix-web matches routes in registration order, so handlers registered before Files take priority.

// Option A: Static file via actix-files
// Place robots.txt in ./static/robots.txt
// actix-files serves it automatically at GET /robots.txt
// Mount LAST — route handlers take priority over Files::new
//
// Cargo.toml: actix-files = "0.6"

use actix_files;

App::new()
    .route("/", web::get().to(index))
    // ... other routes ...
    // Files::new("/", "./static") serves any file under ./static at /filename
    .service(actix_files::Files::new("/", "./static"))


// Option B: Dedicated route handler (dynamic or compile-time embedded)
use actix_web::{get, HttpResponse, Responder};

const ROBOTS_TXT: &str = "User-agent: *
Allow: /

# AI training bots — blocked
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: AmazonBot
Disallow: /

User-agent: Diffbot
Disallow: /";

#[get("/robots.txt")]
async fn robots_txt() -> impl Responder {
    HttpResponse::Ok()
        .content_type("text/plain; charset=utf-8")
        .body(ROBOTS_TXT)
}

// Option C: Include from file at compile time
#[get("/robots.txt")]
async fn robots_embedded() -> impl Responder {
    HttpResponse::Ok()
        .content_type("text/plain; charset=utf-8")
        .body(include_str!("../static/robots.txt"))
}

Step 6 — noai meta tag in templates

Add <meta name="robots" content="noai, noimageai"> to every HTML page. With Askama (compile-time templates) or Tera/Handlebars (runtime), put it in your base template. Pair with the X-Robots-Tag header from middleware for belt-and-suspenders coverage.

// noai meta tag in Tera/Handlebars/Askama templates:
// <meta name="robots" content="noai, noimageai">

// With Askama (compile-time templates):
use askama::Template;
use actix_web::{get, HttpResponse};

#[derive(Template)]
#[template(source = r#"
<!DOCTYPE html>
<html>
<head>
  <meta name="robots" content="noai, noimageai">
  <title>{{ title }}</title>
</head>
<body>{{ content }}</body>
</html>
"#, ext = "html")]
struct PageTemplate {
    title: String,
    content: String,
}

#[get("/")]
async fn index() -> HttpResponse {
    let tmpl = PageTemplate {
        title: "My Site".to_string(),
        content: "Hello".to_string(),
    };
    HttpResponse::Ok()
        .content_type("text/html; charset=utf-8")
        // X-Robots-Tag on the response header (belt + suspenders)
        .insert_header(("x-robots-tag", "noai, noimageai"))
        .body(tmpl.render().unwrap())
}

Actix-web vs Axum vs Warp vs Rocket

FeatureActix-webAxumWarpRocket
Middleware modelwrap_fn closure or Transform+Service trait pairTower layers — Service<Request> → Service<Request>Filter combinators — composable extractors + map/and_thenRequest Guards (per-route FromRequest) + Fairings (lifecycle)
Can abort request?Yes — return HttpResponse::Forbidden() before next.call()Yes — return Response from layer before calling inner serviceYes — return Rejection; map with recover()Guards: yes. Fairings: no (on_request returns ()).
Global blockingApp::wrap(middleware) — all routesRouter::layer(middleware) — all routes.and(filter) composed at server levelFairing on_response override (handler still runs) or add guard to every route
Per-group blockingweb::scope("/api").wrap(middleware)Router::nest("/api", ...).layer(middleware)path("api").and(filter).and(route)No native scope middleware — use per-route guards
UA header accessreq.headers().get("user-agent")req.headers().get("user-agent")warp::header::optional("user-agent")req.headers().get_one("User-Agent")
Hard 403HttpResponse::Forbidden().finish()(StatusCode::FORBIDDEN, "Forbidden").into_response()warp::reject::custom(AiBotRejection)Outcome::Error((Status::Forbidden, ()))
Static files / robots.txtactix_files::Files::new("/", "./static")tower_http::services::ServeDirwarp::fs::dir("./static")FileServer::from("./static") or #[get] route
Async runtimeTokio (via #[actix_web::main])Tokio (Tower ecosystem)TokioTokio (via #[launch])

Summary

  • wrap_fn + App::wrap() — one function, global coverage, zero wasted computation. The right default.
  • Scope::wrap() — protect only /api/* while keeping public routes open. Add it to any scope, not just App.
  • Transform + Service — for dynamic block-lists or shared state. More boilerplate but full control.
  • actix_files::Files — mount last to serve robots.txt without blocking legitimate crawlers.
  • X-Robots-Tag — add to the response in middleware, after next.call(), so it appears on all legitimate responses.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.