How to Block AI Bots in Rust Salvo
Salvo is a Rust web framework built on hyper and Tokio with a macro-driven handler model. Middleware uses the #[handler] macro and a FlowCtrl argument to control the handler chain. req.headers().get() returns Option<&HeaderValue> — chain .and_then(|v| v.to_str().ok()).unwrap_or("") for a safe &str. To block: set the response then call ctrl.skip_rest() — without skip_rest(), subsequent handlers still run and can overwrite the response. Middleware is registered with .hoop().
1. Bot detection
Pure Rust, no dependencies. str::contains() for literal substring matching. Iterator::any() short-circuits on first match.
// bot_utils.rs — AI bot detection, no external dependencies
pub const AI_BOT_PATTERNS: &[&str] = &[
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
];
/// Returns true if ua matches a known AI crawler pattern.
/// str::contains() — literal substring match, no regex.
/// to_lowercase() allocates; for hot paths consider a case-fold comparison.
pub fn is_ai_bot(ua: &str) -> bool {
if ua.is_empty() {
return false;
}
let lower = ua.to_lowercase();
AI_BOT_PATTERNS.iter().any(|&pat| lower.contains(pat))
}2. #[handler] middleware with FlowCtrl
The #[handler] macro generates the boilerplate needed to use an async function as a Salvo handler or middleware. ctrl.skip_rest() is the key — it halts all remaining handlers in the chain. Without it, execution falls through to route handlers regardless of the response you set.
// middleware.rs — Salvo bot-blocking handler
use salvo::http::StatusCode;
use salvo::prelude::*;
use crate::bot_utils::is_ai_bot;
/// #[handler] turns this async function into a Salvo Handler.
/// Middleware handlers receive (&mut Request, &mut Response, &mut FlowCtrl).
/// Set the response and call ctrl.skip_rest() to block.
/// Do nothing to ctrl to let the chain continue (pass through).
#[handler]
pub async fn bot_blocker(req: &mut Request, res: &mut Response, ctrl: &mut FlowCtrl) {
// Path guard: robots.txt must be reachable so bots can read Disallow rules.
if req.uri().path() == "/robots.txt" {
// No ctrl.skip_rest() — chain continues to the robots.txt handler.
return;
}
// req.headers().get() returns Option<&HeaderValue>.
// .and_then(|v| v.to_str().ok()) — to_str() fails if the value contains non-ASCII.
// .unwrap_or("") — safe empty-string fallback when the header is absent.
// Header name lookup is case-insensitive (hyper normalises to lowercase).
let ua = req
.headers()
.get("user-agent")
.and_then(|v| v.to_str().ok())
.unwrap_or("");
if is_ai_bot(ua) {
// Block: set status, inject headers, write body.
// ctrl.skip_rest() MUST be called to stop subsequent handlers from running.
// Without it, the next handler in the chain could overwrite this response.
res.status_code(StatusCode::FORBIDDEN);
res.headers_mut().insert(
"x-robots-tag",
"noai, noimageai".parse().unwrap(),
);
res.headers_mut().insert(
"content-type",
"text/plain; charset=utf-8".parse().unwrap(),
);
res.render(Text::Plain("Forbidden"));
ctrl.skip_rest(); // halt the handler chain — return immediately after
return;
}
// Pass: inject X-Robots-Tag and let the chain continue.
// Do NOT call ctrl.skip_rest() here — the route handler must run.
res.headers_mut().insert(
"x-robots-tag",
"noai, noimageai".parse().unwrap(),
);
// No ctrl.skip_rest() — execution continues to the next handler.
}3. main.rs — global .hoop() registration
Router::new().hoop(bot_blocker) registers the middleware on the root router — it runs for every request before any route handler. Multiple .hoop() calls chain in order.
// main.rs — Salvo app with global bot-blocking middleware
use salvo::prelude::*;
mod bot_utils;
mod middleware;
use middleware::bot_blocker;
#[handler]
async fn robots_txt(res: &mut Response) {
res.headers_mut().insert(
"content-type",
"text/plain; charset=utf-8".parse().unwrap(),
);
res.render(Text::Plain(
"User-agent: *\nAllow: /\n\n\
User-agent: GPTBot\nDisallow: /\n\n\
User-agent: ClaudeBot\nDisallow: /\n\n\
User-agent: CCBot\nDisallow: /\n\n\
User-agent: Google-Extended\nDisallow: /\n",
));
}
#[handler]
async fn index(res: &mut Response) {
res.render(Json(serde_json::json!({ "message": "Hello" })));
}
#[handler]
async fn api_data(res: &mut Response) {
res.render(Json(serde_json::json!({ "data": "value" })));
}
#[tokio::main]
async fn main() {
// .hoop() registers middleware on the router.
// Middleware runs before route handlers in registration order.
// bot_blocker is registered globally — applies to ALL routes on this router.
let router = Router::new()
.hoop(bot_blocker) // global — runs for every request
.get("/robots.txt", robots_txt)
.get("/", index)
.get("/api/data", api_data);
let acceptor = TcpListener::new("0.0.0.0:8080").bind().await;
Server::new(acceptor).serve(router).await;
}4. Scoped middleware — nested Router
Router::with_path("/api").hoop(bot_blocker) scopes the middleware to /api/**. Push it onto the root router with .push(). Routes on the root are unaffected.
// Scoped middleware — protect /api routes using a nested Router.
// Routes on the root router are NOT affected by api-scoped middleware.
#[tokio::main]
async fn main() {
// Root router — no bot blocking
let root = Router::new()
.get("/robots.txt", robots_txt)
.get("/", index);
// Nested /api router — bot blocker scoped to /api/**
// .with_path() sets the path prefix for all routes pushed onto this router.
let api = Router::with_path("/api")
.hoop(bot_blocker) // only /api/** routes are protected
.get("/data", api_data)
.get("/status", api_status);
// .push() mounts the nested router onto the root router.
let router = root.push(api);
let acceptor = TcpListener::new("0.0.0.0:8080").bind().await;
Server::new(acceptor).serve(router).await;
}5. Cargo.toml
# Cargo.toml
[package]
name = "bot-blocker"
version = "0.1.0"
edition = "2021"
[dependencies]
salvo = { version = "0.70", features = ["full"] }
tokio = { version = "1", features = ["full"] }
serde_json = "1"
# Run: cargo run
# Build release: cargo build --releaseKey points
ctrl.skip_rest()is required to block: Setting a 403 response without callingctrl.skip_rest()does not stop the chain. The next handler runs and will overwrite your response. Always callctrl.skip_rest()after setting the blocking response, thenreturn.req.headers().get()returnsOption<&HeaderValue>: The header may be absent (None) or contain non-UTF-8 bytes (making.to_str()fail). Chain.and_then(|v| v.to_str().ok()).unwrap_or("")to handle both cases with a safe empty-string default. Header name lookup is case-insensitive (hyper normalises to lowercase).- Set headers before
res.render():res.render()may finalise parts of the response. Insert custom headers viares.headers_mut().insert()before callingrender()to ensure they appear in the response. .hoop()registers middleware — multiple calls chain in order: First.hoop()call runs first. Middleware registered on a nested router only applies to that router's routes — root routes are unaffected. This is the equivalent ofapp.use()in Express orr.Use()in Go routers.#[handler]is flexible — not all parameters are required: A handler can take any subset of(&mut Request, &mut Response, &mut FlowCtrl). If your middleware only needs to read the request and modify the response, you can omitFlowCtrlfrom the signature — but then you cannot callskip_rest()(you must include it if you need to block).- Do not call
ctrl.skip_rest()on the pass branch: Ifskip_rest()is called when passing through, the route handler never runs — the response you set (or an empty response) is returned. Only call it when you intend to block.
Framework comparison — Rust web frameworks
| Framework | Middleware style | Block | Pass |
|---|---|---|---|
| Salvo | #[handler] + FlowCtrl | set response then ctrl.skip_rest() | return without skip_rest() |
| Axum | tower Service or from_fn | return Response::builder().status(403)... | next.run(req).await |
| Actix-web | Transform + Service trait | ok(HttpResponse::Forbidden().finish()) | self.service.call(req).await |
| Warp | Filter composition | Err(warp::reject::custom(Blocked)) | Ok(req) (filter passes value through) |
Salvo's FlowCtrl model is the most distinctive — blocking requires an explicit skip_rest() call rather than a return value or an error type. Axum and Actix-web both use return-value blocking (return a Response directly), while Warp uses filter rejection. Salvo's #[handler] macro avoids the boilerplate of Axum's tower Service trait or Actix-web's Transform + Service pair.