Skip to content
Guides/Warp

How to Block AI Bots on Warp (Rust): Complete 2026 Guide

Warp is a composable, type-safe Rust web framework from Cloudflare. It has no traditional middleware — everything is a Filter. Bot blocking uses Warp's filter system: warp::reject::custom(AiBotRejection) short-circuits the filter chain before any handler runs, and .recover() converts all rejections into proper HTTP responses.

No middleware — only Filters

In Axum and Actix-web, middleware wraps route handlers as a chain. In Warp, a Filter is a composable unit that either succeeds (extracting a value and passing it downstream) or rejects (producing a Rejection that bypasses all downstream filters). Chain .and(bot_check()) to any route and the handler never executes for AI bots. The entire type chain is verified at compile time.

Protection layers

1
robots.txtwarp::fs::dir("./static") or an explicit route — always unblocked
2
noai meta tagIn warp::reply::html() responses or Tera/Handlebars templates
3
X-Robots-Tag header.map(|r| warp::reply::with_header(r, "x-robots-tag", "noai, noimageai")) chained on routes
4
Hard 403 — per-route .and(bot_check())Filter rejects AI bots — handler never runs. recover() returns proper 403 response.
5
Hard 403 — recover() fallbackCatches all AiBotRejection instances regardless of which route they came from.

Dependencies (Cargo.toml)

# Cargo.toml — required dependencies
[dependencies]
warp = "0.3"
tokio = { version = "1", features = ["full"] }
serde_json = "1"     # optional — for JSON replies

Step 1 — Shared bot list (src/bots.rs)

Same zero-cost slice pattern as all Rust frameworks — strings baked into the binary, lowercased substring matching at request time.

// src/bots.rs — shared AI bot list
pub const AI_BOTS: &[&str] = &[
    // OpenAI
    "gptbot", "chatgpt-user", "oai-searchbot",
    // Anthropic
    "claudebot", "claude-web",
    // Common Crawl
    "ccbot",
    // Bytedance
    "bytespider",
    // Meta
    "meta-externalagent",
    // Perplexity
    "perplexitybot",
    // Google AI
    "google-extended", "googleother",
    // Cohere
    "cohere-ai",
    // Amazon
    "amazonbot",
    // Diffbot
    "diffbot",
    // AI2
    "ai2bot",
    // DeepSeek
    "deepseekbot",
    // Mistral
    "mistralai-user",
    // xAI
    "xai-bot",
    // You.com
    "youbot",
    // DuckDuckGo AI
    "duckassistbot",
];

pub fn is_ai_bot(user_agent: &str) -> bool {
    let ua = user_agent.to_lowercase();
    AI_BOTS.iter().any(|bot| ua.contains(bot))
}

Step 2 — Custom Rejection and bot-check filter

impl Reject for AiBotRejection is the marker that makes a type usable as a Warp rejection. The bot_check() filter uses warp::header::optional() to extract the User-Agent without requiring it — missing UA passes as an empty string. .untuple_one() strips the wrapper tuple so .and(bot_check()) composes cleanly without adding an extra value to the handler's argument list.

// src/filters.rs — custom Rejection and bot-check filter

use warp::reject::Reject;
use warp::Filter;

use crate::bots::is_ai_bot;

/// Custom rejection type for AI bot blocking.
/// Reject::impl is a marker trait — no methods needed.
#[derive(Debug)]
pub struct AiBotRejection;
impl Reject for AiBotRejection {}

/// Filter that rejects AI bots with a custom Rejection.
///
/// Usage: route.and(bot_check())
/// - AI bot  → warp::reject::custom(AiBotRejection) — handler never runs
/// - Legit   → passes () to the next filter in the chain
pub fn bot_check() -> impl Filter<Extract = (), Error = warp::Rejection> + Clone {
    warp::header::optional::<String>("user-agent")
        .and_then(|ua: Option<String>| async move {
            let ua_str = ua.as_deref().unwrap_or("");
            if is_ai_bot(ua_str) {
                Err(warp::reject::custom(AiBotRejection))
            } else {
                Ok(())
            }
        })
        // untuple_one: converts Filter<Extract=((),)> to Filter<Extract=()>
        // so it can be chained with .and() cleanly
        .untuple_one()
}

Step 3 — recover() — rejection to HTTP response

Without recover(), Warp returns a 405 for unhandled custom rejections. The err.find::<AiBotRejection>() pattern downcasts the rejection type. Always include a fallback arm — missing it silently swallows unrelated errors.

// src/handlers.rs — recover() converts Rejections to HTTP responses

use std::convert::Infallible;
use warp::http::StatusCode;
use warp::Rejection;

use crate::filters::AiBotRejection;

/// Global rejection handler — converts all Rejections to HTTP responses.
/// Must be passed to .recover() on your top-level route.
pub async fn handle_rejection(err: Rejection) -> Result<impl warp::Reply, Infallible> {
    if err.find::<AiBotRejection>().is_some() {
        // AI bot — return 403 with X-Robots-Tag
        return Ok(warp::reply::with_status(
            warp::reply::with_header(
                "Forbidden",
                "x-robots-tag",
                "noai, noimageai",
            ),
            StatusCode::FORBIDDEN,
        ));
    }

    if err.is_not_found() {
        return Ok(warp::reply::with_status(
            warp::reply::with_header("Not Found", "x-robots-tag", "noai, noimageai"),
            StatusCode::NOT_FOUND,
        ));
    }

    // Fallback — don't swallow unknown rejections
    Ok(warp::reply::with_status(
        warp::reply::with_header("Internal Server Error", "x-robots-tag", "noai, noimageai"),
        StatusCode::INTERNAL_SERVER_ERROR,
    ))
}

Step 4 — Route composition with .and(bot_check())

Chain .and(bot_check()) before the handler closure on every route you want to protect. robots.txt and /health are intentionally left unprotected — legitimate search crawlers need robots.txt access.

// src/main.rs — composing routes with bot_check filter

use warp::Filter;

mod bots;
mod filters;
mod handlers;

#[tokio::main]
async fn main() {
    // robots.txt — MUST be unblocked for legitimate crawlers.
    // Register before bot_check routes so it doesn't get caught.
    let robots = warp::path("robots.txt")
        .and(warp::get())
        .map(|| {
            warp::reply::with_header(
                warp::reply::html(include_str!("../static/robots.txt")),
                "content-type",
                "text/plain; charset=utf-8",
            )
        });

    // Health — no bot check, always accessible
    let health = warp::path("health")
        .and(warp::get())
        .map(|| "ok");

    // Protected index — bot_check() runs before the handler.
    // AI bots → AiBotRejection (handler never runs).
    // Legit requests → passes through to the closure.
    let index = warp::path::end()
        .and(warp::get())
        .and(filters::bot_check())   // ← chains the filter
        .map(|| "Welcome to my site");

    // Protected API — same pattern, scoped to /api
    let api_data = warp::path!("api" / "data")
        .and(warp::get())
        .and(filters::bot_check())
        .map(|| warp::reply::json(&serde_json::json!({ "data": "protected" })));

    // Static files (fallback) — serves ./static directory
    let static_files = warp::fs::dir("./static");

    // Combine all routes — .or() tries each in order
    let routes = robots
        .or(health)
        .or(index)
        .or(api_data)
        .or(static_files)
        // Map all successful responses to include X-Robots-Tag
        .map(|reply| {
            warp::reply::with_header(reply, "x-robots-tag", "noai, noimageai")
        })
        // Convert all Rejections (including AiBotRejection) to HTTP responses
        .recover(handlers::handle_rejection);

    println!("Listening on 0.0.0.0:8080");
    warp::serve(routes).run(([0, 0, 0, 0], 8080)).await;
}

Step 5 — Global blocking note (Warp's limitation)

No single-point global middleware: Unlike Axum's Router::layer() or Actix-web's App::wrap(), Warp has no single location to apply a filter to all routes. You must chain .and(bot_check()) to each route you want protected. For large route sets, factor this into a helper function that wraps route definitions. Alternatively, put a reverse proxy (nginx, Caddy) in front of Warp for zero-configuration global blocking.

// Alternative: global bot check applied to all protected routes at once

use warp::Filter;

// Wrap a group of routes with bot_check applied to all of them
fn protected_routes() -> impl Filter<Extract = impl warp::Reply, Error = warp::Rejection> + Clone {
    let index = warp::path::end()
        .and(warp::get())
        .map(|| "Home page");

    let dashboard = warp::path("dashboard")
        .and(warp::get())
        .map(|| "Dashboard");

    let api = warp::path("api")
        .and(warp::path("data"))
        .and(warp::get())
        .map(|| warp::reply::json(&serde_json::json!({ "data": "ok" })));

    // Apply bot_check to the entire group via .and() before combining
    // Note: in Warp, you can't wrap multiple routes with a single shared filter
    // the way Axum or Actix use scope middleware.
    // The idiomatic approach is to chain .and(bot_check()) per-route,
    // or wrap the combined routes in a boxed filter.
    index.or(dashboard).or(api)
}

// For truly global blocking at the top level, use recover() to intercept
// any request that doesn't match a route AND add bot_check to every route.
// This is more verbose than Axum::layer() but achieves the same effect.

Step 6 — robots.txt

Always register the robots.txt route without bot_check() — legitimate search crawlers need to read it. Register it before warp::fs::dir() so the explicit handler takes priority over the static file fallback.

// robots.txt in Warp — three approaches

// Option A: warp::fs::dir — serves ./static directory including robots.txt
// Place robots.txt in ./static/robots.txt
// warp::fs::dir serves any file at its path: /robots.txt → ./static/robots.txt
let static_files = warp::fs::dir("./static");


// Option B: Explicit route with inline content
const ROBOTS_TXT: &str = "User-agent: *
Allow: /

# AI training bots — blocked
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: AmazonBot
Disallow: /

User-agent: Diffbot
Disallow: /";

let robots = warp::path("robots.txt")
    .and(warp::get())
    .map(|| {
        warp::http::Response::builder()
            .header("content-type", "text/plain; charset=utf-8")
            .body(ROBOTS_TXT)
    });


// Option C: Compile-time embed with include_str!()
let robots_embedded = warp::path("robots.txt")
    .and(warp::get())
    .map(|| {
        warp::http::Response::builder()
            .header("content-type", "text/plain; charset=utf-8")
            .body(include_str!("../static/robots.txt"))
    });

Step 7 — noai meta tag in HTML responses

// noai meta tag in HTML responses

use warp::Filter;

// Return HTML with noai meta tag directly from a handler
let index = warp::path::end()
    .and(warp::get())
    .and(filters::bot_check())
    .map(|| {
        warp::reply::html(r#"<!DOCTYPE html>
<html>
<head>
  <meta name="robots" content="noai, noimageai">
  <title>My Site</title>
</head>
<body>
  <h1>Welcome</h1>
</body>
</html>"#)
    });

// With Tera templates:
// In your base template, add the meta tag inside <head>.
// Pass the rendered string to warp::reply::html().

Warp vs Axum vs Actix-web vs Rocket

FeatureWarpAxumActix-webRocket
Abstraction modelFilter combinators — .and(), .or(), .map(), .and_then()Router + Tower layers — from_fn() middlewareApp + wrap_fn closures or Transform+ServiceRoutes + Request Guards + Fairings (lifecycle)
Request blockingwarp::reject::custom(AiBotRejection) — Rejection bubbles to recover()Return Response early without calling next.run(req)Return HttpResponse early without calling next.call(req)Outcome::Error((Status::Forbidden, ())) in FromRequest
Global blockingChain .and(bot_check()) per-route; recover() handles all rejections globallyRouter::layer(from_fn(blocker)) — single declarationApp::wrap(middleware) — single declarationNo true global middleware — fairing override (handler still runs)
Blocking scopePer-route with .and() — no scope/group middleware conceptRouter::layer() (global) or route_layer() (matched)App::wrap() (global) or Scope::wrap() (prefixed group)Per-route guards only; no group scope
Add response header.map(|r| warp::reply::with_header(r, "x-robots-tag", "noai, noimageai"))Modify res.headers_mut() after next.run() or SetResponseHeaderLayerModify res.headers_mut() after next.call()Fairing on_response: res.set_raw_header()
UA header accesswarp::header::optional::<String>("user-agent")req.headers().get(header::USER_AGENT)req.headers().get("user-agent")req.headers().get_one("User-Agent")
Static files / robots.txtwarp::fs::dir("./static") or explicit routetower_http::services::ServeDiractix_files::Files::new("/", "./static")FileServer::from("./static") or #[get] route
Compile-time checksYes — filter types fully checked at compile time (verbose errors)Partial — extractor types checked, some runtime checks remainPartial — wrap_fn checked; Transform implementation partially runtimeYes — route types and guard types checked at compile time

Summary

  • .and(bot_check()) — chain before any handler to block AI bots per-route. Handler never runs for blocked requests.
  • warp::reject::custom(AiBotRejection) — the Warp-idiomatic way to short-circuit. Always pair with .recover().
  • recover() — global rejection handler. Converts AiBotRejection to 403. Handles 404s and errors in one place.
  • No group middleware — must chain bot_check() per-route or use nginx/Caddy in front for zero-config global blocking.
  • robots.txt unblocked — always register it before bot_check() routes so legitimate crawlers can read it.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.