Skip to content

How to Block AI Bots in Rust Poem

Poem is a full-featured async Rust web framework built on hyper and tokio. Its Middleware trait has one method: transform(&self, ep: E) -> Self::Output, called once at startup to wrap the inner Endpoint. Per-request bot detection runs inside the wrapper struct's Endpoint::call implementation. req.headers().get(USER_AGENT) returns Option<&HeaderValue> — call to_str() which returns Result<&str, ToStrError> (not infallible — headers can contain non-UTF-8 bytes). Extract the User-Agent before calling self.0.call(req) Request is moved into the inner endpoint and cannot be accessed after.

1. Bot detection

Pure Rust, no dependencies. to_ascii_lowercase() for case-folding, contains() for substring matching. Non-UTF-8 User-Agents are not AI bots — treat to_str() errors as "not a bot".

// ai_bot_detector.rs — AI bot detection, no external dependencies

/// Known AI crawler User-Agent substrings.
/// All lowercase — compared against a lowercased User-Agent.
const AI_BOT_PATTERNS: &[&str] = &[
    "gptbot",
    "chatgpt-user",
    "claudebot",
    "anthropic-ai",
    "ccbot",
    "google-extended",
    "cohere-ai",
    "meta-externalagent",
    "bytespider",
    "omgili",
    "diffbot",
    "imagesiftbot",
    "magpie-crawler",
    "amazonbot",
    "dataprovider",
    "netcraft",
];

/// Returns true if the User-Agent matches a known AI crawler.
///
/// HeaderValue::to_str() returns Result<&str, ToStrError> — headers CAN
/// contain non-UTF-8 bytes (RFC 7230 allows visible ASCII + obs-text).
/// Non-UTF-8 UAs are not AI bots, so treat Err as "not a bot".
pub fn is_ai_bot(ua: &str) -> bool {
    if ua.is_empty() {
        return false;
    }
    let lower = ua.to_ascii_lowercase();
    AI_BOT_PATTERNS.iter().any(|p| lower.contains(p))
}

2. Middleware and server setup

The Middleware trait wraps an Endpoint in a new struct. The wrapper implements Endpoint and runs bot detection in call(). Apply with .with(AiBotBlocker) on any Route.

// main.rs — Poem server with AI bot blocking middleware
// Cargo.toml: poem = "3", tokio = { version = "1", features = ["full"] }

mod ai_bot_detector;

use ai_bot_detector::is_ai_bot;
use poem::{
    handler, http::header, http::StatusCode, listener::TcpListener, Endpoint,
    IntoResponse, Middleware, Request, Response, Result, Route, Server,
    web::Path,
};

// ── Middleware ─────────────────────────────────────────────────────────────
//
// Poem's Middleware trait has one method:
//   fn transform(&self, ep: E) -> Self::Output
//
// - transform() is called ONCE at startup — it wraps the inner Endpoint
// - Per-request logic lives in the Endpoint impl on the wrapper struct
// - The wrapper's call() receives Request by value (moved, not borrowed)
//
// This is simpler than Tower (Layer + Service) or Actix (Transform + Service):
//   - No associated Future types
//   - No Pin<Box<dyn Future>>
//   - No async factory method

struct AiBotBlocker;

impl<E: Endpoint> Middleware<E> for AiBotBlocker {
    type Output = AiBotBlockerEndpoint<E>;

    fn transform(&self, ep: E) -> Self::Output {
        AiBotBlockerEndpoint(ep)
    }
}

struct AiBotBlockerEndpoint<E>(E);

impl<E: Endpoint> Endpoint for AiBotBlockerEndpoint<E> {
    type Output = Response;

    async fn call(&self, req: Request) -> Result<Self::Output> {
        let path = req.uri().path().to_owned();

        // Always allow robots.txt so crawlers discover Disallow rules.
        if path == "/robots.txt" {
            return Ok(self.0.call(req).await?.into_response());
        }

        // Extract User-Agent BEFORE forwarding — Request is moved into inner.
        // req.headers().get() returns Option<&HeaderValue>.
        // to_str() returns Result<&str, ToStrError> — non-UTF-8 is not a bot.
        let ua = req
            .headers()
            .get(header::USER_AGENT)
            .and_then(|v| v.to_str().ok())
            .unwrap_or("");

        if is_ai_bot(ua) {
            // Block: return 403 without calling the inner endpoint.
            return Ok(Response::builder()
                .status(StatusCode::FORBIDDEN)
                .header("Content-Type", "text/plain")
                .header("X-Robots-Tag", "noai, noimageai")
                .body("Forbidden"));
        }

        // Pass: forward to inner, then inject X-Robots-Tag on the response.
        let mut resp = self.0.call(req).await?.into_response();
        resp.headers_mut().insert(
            "X-Robots-Tag".parse().unwrap(),
            "noai, noimageai".parse().unwrap(),
        );
        Ok(resp)
    }
}

// ── Handlers ──────────────────────────────────────────────────────────────

const ROBOTS_TXT: &str = "\
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /
";

#[handler]
fn robots_txt() -> Response {
    Response::builder()
        .status(StatusCode::OK)
        .header("Content-Type", "text/plain")
        .body(ROBOTS_TXT)
}

#[handler]
fn index() -> &'static str {
    "ok"
}

// ── Server bootstrap ─────────────────────────────────────────────────────
//
// .with(AiBotBlocker) applies the middleware to the entire Route.
// Middleware is outermost — it sees the request first, response last.
// Multiple .with() calls stack: last applied = outermost.

#[tokio::main]
async fn main() -> std::result::Result<(), std::io::Error> {
    let app = Route::new()
        .at("/", poem::get(index))
        .at("/robots.txt", poem::get(robots_txt))
        .with(AiBotBlocker);  // applied to all routes

    Server::new(TcpListener::bind("0.0.0.0:3000"))
        .run(app)
        .await
}

3. Per-handler blocking

For simple cases, check the User-Agent inside the handler. Poem's #[handler] macro can receive &Request (borrowed) — the macro handles ownership so you don't need to extract headers before a move.

// Per-handler bot blocking — no Middleware trait needed.
//
// For simple cases, check the User-Agent inside the handler itself.
// Poem's #[handler] macro converts async fn into an Endpoint.

use poem::{handler, http::header, http::StatusCode, Request, Response, Result};

#[handler]
async fn protected_endpoint(req: &Request) -> Result<Response> {
    // #[handler] can receive &Request (borrowed, not moved).
    // This is a Poem convenience — the macro handles ownership.
    let ua = req
        .headers()
        .get(header::USER_AGENT)
        .and_then(|v| v.to_str().ok())
        .unwrap_or("");

    if is_ai_bot(ua) {
        return Ok(Response::builder()
            .status(StatusCode::FORBIDDEN)
            .header("X-Robots-Tag", "noai, noimageai")
            .body("Forbidden"));
    }

    Ok(Response::builder()
        .status(StatusCode::OK)
        .header("X-Robots-Tag", "noai, noimageai")
        .body("protected content"))
}

// In Route:
// Route::new().at("/api/data", poem::get(protected_endpoint))

4. Route-scoped middleware

Poem's .with() applies to the Route it's called on. Use .nest() to mount sub-routes with their own middleware — only the nested routes get bot blocking.

// Route-scoped middleware — protect only /api/* routes.
//
// In Poem, .with() applies to the Route it's called on.
// Use nested Routes to scope middleware to specific path prefixes.

use poem::{Route, EndpointExt};

let api_routes = Route::new()
    .at("/data", poem::get(api_data))
    .at("/users", poem::get(api_users))
    .with(AiBotBlocker);  // only /api/* gets bot blocking

let public_routes = Route::new()
    .at("/", poem::get(index))
    .at("/about", poem::get(about));
    // no bot blocking on public routes

let app = Route::new()
    .nest("/api", api_routes)    // /api/data, /api/users — blocked
    .nest("/", public_routes)    // /, /about — not blocked
    .at("/robots.txt", poem::get(robots_txt));

// Alternative: check path inside the middleware itself.
// Modify AiBotBlockerEndpoint::call to skip non-API paths:
//
//   if !path.starts_with("/api/") {
//       return Ok(self.0.call(req).await?.into_response());
//   }

Key points

Framework comparison — Rust web middleware models

FrameworkMiddleware modelBlock requestHeader access
PoemMiddleware + Endpoint wrapperReturn Ok(Response 403) without calling innerreq.headers().get() Option<&HeaderValue>
AxumTower Layer + ServiceReturn Ok(Response 403) without calling innerreq.headers().get() Option<&HeaderValue>
Actix-webTransform + ServiceReturn Ok(req.into_response(HttpResponse::Forbidden()))req.headers().get() Option<&HeaderValue>
RocketRequest Guards (FromRequest)Outcome::Error((Status::Forbidden, ()))req.headers().get_one() Option<&str>

Poem's Middleware + Endpoint approach requires the least boilerplate of the four — no Pin<Box<dyn Future>>, no associated Future types, no async factory. The trade-off: Poem middleware is framework-specific, while Tower middleware (Axum) is reusable across any Tower-compatible service (Tonic gRPC, raw Hyper, etc.).