How to Block AI Bots in Rust Poem
Poem is a full-featured async Rust web framework built on hyper and tokio. Its Middleware trait has one method: transform(&self, ep: E) -> Self::Output, called once at startup to wrap the inner Endpoint. Per-request bot detection runs inside the wrapper struct's Endpoint::call implementation. req.headers().get(USER_AGENT) returns Option<&HeaderValue> — call to_str() which returns Result<&str, ToStrError> (not infallible — headers can contain non-UTF-8 bytes). Extract the User-Agent before calling self.0.call(req) — Request is moved into the inner endpoint and cannot be accessed after.
1. Bot detection
Pure Rust, no dependencies. to_ascii_lowercase() for case-folding, contains() for substring matching. Non-UTF-8 User-Agents are not AI bots — treat to_str() errors as "not a bot".
// ai_bot_detector.rs — AI bot detection, no external dependencies
/// Known AI crawler User-Agent substrings.
/// All lowercase — compared against a lowercased User-Agent.
const AI_BOT_PATTERNS: &[&str] = &[
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
];
/// Returns true if the User-Agent matches a known AI crawler.
///
/// HeaderValue::to_str() returns Result<&str, ToStrError> — headers CAN
/// contain non-UTF-8 bytes (RFC 7230 allows visible ASCII + obs-text).
/// Non-UTF-8 UAs are not AI bots, so treat Err as "not a bot".
pub fn is_ai_bot(ua: &str) -> bool {
if ua.is_empty() {
return false;
}
let lower = ua.to_ascii_lowercase();
AI_BOT_PATTERNS.iter().any(|p| lower.contains(p))
}2. Middleware and server setup
The Middleware trait wraps an Endpoint in a new struct. The wrapper implements Endpoint and runs bot detection in call(). Apply with .with(AiBotBlocker) on any Route.
// main.rs — Poem server with AI bot blocking middleware
// Cargo.toml: poem = "3", tokio = { version = "1", features = ["full"] }
mod ai_bot_detector;
use ai_bot_detector::is_ai_bot;
use poem::{
handler, http::header, http::StatusCode, listener::TcpListener, Endpoint,
IntoResponse, Middleware, Request, Response, Result, Route, Server,
web::Path,
};
// ── Middleware ─────────────────────────────────────────────────────────────
//
// Poem's Middleware trait has one method:
// fn transform(&self, ep: E) -> Self::Output
//
// - transform() is called ONCE at startup — it wraps the inner Endpoint
// - Per-request logic lives in the Endpoint impl on the wrapper struct
// - The wrapper's call() receives Request by value (moved, not borrowed)
//
// This is simpler than Tower (Layer + Service) or Actix (Transform + Service):
// - No associated Future types
// - No Pin<Box<dyn Future>>
// - No async factory method
struct AiBotBlocker;
impl<E: Endpoint> Middleware<E> for AiBotBlocker {
type Output = AiBotBlockerEndpoint<E>;
fn transform(&self, ep: E) -> Self::Output {
AiBotBlockerEndpoint(ep)
}
}
struct AiBotBlockerEndpoint<E>(E);
impl<E: Endpoint> Endpoint for AiBotBlockerEndpoint<E> {
type Output = Response;
async fn call(&self, req: Request) -> Result<Self::Output> {
let path = req.uri().path().to_owned();
// Always allow robots.txt so crawlers discover Disallow rules.
if path == "/robots.txt" {
return Ok(self.0.call(req).await?.into_response());
}
// Extract User-Agent BEFORE forwarding — Request is moved into inner.
// req.headers().get() returns Option<&HeaderValue>.
// to_str() returns Result<&str, ToStrError> — non-UTF-8 is not a bot.
let ua = req
.headers()
.get(header::USER_AGENT)
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if is_ai_bot(ua) {
// Block: return 403 without calling the inner endpoint.
return Ok(Response::builder()
.status(StatusCode::FORBIDDEN)
.header("Content-Type", "text/plain")
.header("X-Robots-Tag", "noai, noimageai")
.body("Forbidden"));
}
// Pass: forward to inner, then inject X-Robots-Tag on the response.
let mut resp = self.0.call(req).await?.into_response();
resp.headers_mut().insert(
"X-Robots-Tag".parse().unwrap(),
"noai, noimageai".parse().unwrap(),
);
Ok(resp)
}
}
// ── Handlers ──────────────────────────────────────────────────────────────
const ROBOTS_TXT: &str = "\
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
";
#[handler]
fn robots_txt() -> Response {
Response::builder()
.status(StatusCode::OK)
.header("Content-Type", "text/plain")
.body(ROBOTS_TXT)
}
#[handler]
fn index() -> &'static str {
"ok"
}
// ── Server bootstrap ─────────────────────────────────────────────────────
//
// .with(AiBotBlocker) applies the middleware to the entire Route.
// Middleware is outermost — it sees the request first, response last.
// Multiple .with() calls stack: last applied = outermost.
#[tokio::main]
async fn main() -> std::result::Result<(), std::io::Error> {
let app = Route::new()
.at("/", poem::get(index))
.at("/robots.txt", poem::get(robots_txt))
.with(AiBotBlocker); // applied to all routes
Server::new(TcpListener::bind("0.0.0.0:3000"))
.run(app)
.await
}3. Per-handler blocking
For simple cases, check the User-Agent inside the handler. Poem's #[handler] macro can receive &Request (borrowed) — the macro handles ownership so you don't need to extract headers before a move.
// Per-handler bot blocking — no Middleware trait needed.
//
// For simple cases, check the User-Agent inside the handler itself.
// Poem's #[handler] macro converts async fn into an Endpoint.
use poem::{handler, http::header, http::StatusCode, Request, Response, Result};
#[handler]
async fn protected_endpoint(req: &Request) -> Result<Response> {
// #[handler] can receive &Request (borrowed, not moved).
// This is a Poem convenience — the macro handles ownership.
let ua = req
.headers()
.get(header::USER_AGENT)
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if is_ai_bot(ua) {
return Ok(Response::builder()
.status(StatusCode::FORBIDDEN)
.header("X-Robots-Tag", "noai, noimageai")
.body("Forbidden"));
}
Ok(Response::builder()
.status(StatusCode::OK)
.header("X-Robots-Tag", "noai, noimageai")
.body("protected content"))
}
// In Route:
// Route::new().at("/api/data", poem::get(protected_endpoint))4. Route-scoped middleware
Poem's .with() applies to the Route it's called on. Use .nest() to mount sub-routes with their own middleware — only the nested routes get bot blocking.
// Route-scoped middleware — protect only /api/* routes.
//
// In Poem, .with() applies to the Route it's called on.
// Use nested Routes to scope middleware to specific path prefixes.
use poem::{Route, EndpointExt};
let api_routes = Route::new()
.at("/data", poem::get(api_data))
.at("/users", poem::get(api_users))
.with(AiBotBlocker); // only /api/* gets bot blocking
let public_routes = Route::new()
.at("/", poem::get(index))
.at("/about", poem::get(about));
// no bot blocking on public routes
let app = Route::new()
.nest("/api", api_routes) // /api/data, /api/users — blocked
.nest("/", public_routes) // /, /about — not blocked
.at("/robots.txt", poem::get(robots_txt));
// Alternative: check path inside the middleware itself.
// Modify AiBotBlockerEndpoint::call to skip non-API paths:
//
// if !path.starts_with("/api/") {
// return Ok(self.0.call(req).await?.into_response());
// }Key points
transform()runs once,call()runs per-request:Middleware::transformwraps the inner Endpoint at startup. The wrapper struct'sEndpoint::callhandles each request. Do not put per-request logic intransform.Requestis moved into the inner endpoint:Endpoint::calltakesreq: Requestby value. Once you callself.0.call(req).await, the request is consumed. Extract headers, path, and query parameters before forwarding.to_str()is fallible:HeaderValue::to_str()returnsResult<&str, ToStrError>, not&str. HTTP headers can contain non-UTF-8 bytes (RFC 7230 obs-text). Use.and_then(|v| v.to_str().ok())to silently skip non-UTF-8 values — they are never AI bot identifiers..with()stacking order: Multiple.with()calls stack — the last one applied is the outermost (sees the request first, response last).route.with(A).with(B)means B runs before A on the request path.#[handler]supports&Request: Inside a#[handler]function, you can take&Request(borrowed reference) instead ofRequest. The macro manages ownership — no need to extract headers before a move. This does not apply to rawEndpointimpls..nest()for scoped middleware:.with()applies to theRouteit's called on. Use.nest("/api", api_routes.with(AiBotBlocker))to limit bot blocking to a path prefix without checking paths inside the middleware.into_response()for type erasure: The inner endpoint's output type may differ fromResponse. Call.into_response()on the inner result to convert anyIntoResponsetype into aResponseyou can modify (e.g., inserting headers).
Framework comparison — Rust web middleware models
| Framework | Middleware model | Block request | Header access |
|---|---|---|---|
| Poem | Middleware + Endpoint wrapper | Return Ok(Response 403) without calling inner | req.headers().get() → Option<&HeaderValue> |
| Axum | Tower Layer + Service | Return Ok(Response 403) without calling inner | req.headers().get() → Option<&HeaderValue> |
| Actix-web | Transform + Service | Return Ok(req.into_response(HttpResponse::Forbidden())) | req.headers().get() → Option<&HeaderValue> |
| Rocket | Request Guards (FromRequest) | Outcome::Error((Status::Forbidden, ())) | req.headers().get_one() → Option<&str> |
Poem's Middleware + Endpoint approach requires the least boilerplate of the four — no Pin<Box<dyn Future>>, no associated Future types, no async factory. The trade-off: Poem middleware is framework-specific, while Tower middleware (Axum) is reusable across any Tower-compatible service (Tonic gRPC, raw Hyper, etc.).