How to Block AI Bots in Python Robyn
Robyn is a Python web framework built on Rust (via PyO3) — it uses its own Rust-backed event loop rather than Python's asyncio, which allows it to bypass the GIL for I/O-bound work. The API is Flask-like but with important differences in middleware. Bot blocking uses @app.before_request() global middleware. Robyn stores all headers in lowercase — always use request.headers.get("user-agent"), not "User-Agent". Return a Response to block; return the Request to pass — returning None is invalid.
1. Bot detection
Pure Python, no dependencies. any() with a generator short-circuits on first match.
# bot_utils.py — AI bot detection, no external dependencies
AI_BOT_PATTERNS = [
"gptbot",
"chatgpt-user",
"claudebot",
"anthropic-ai",
"ccbot",
"google-extended",
"cohere-ai",
"meta-externalagent",
"bytespider",
"omgili",
"diffbot",
"imagesiftbot",
"magpie-crawler",
"amazonbot",
"dataprovider",
"netcraft",
]
def is_ai_bot(ua: str) -> bool:
"""Return True if ua matches a known AI crawler pattern.
Lowercase comparison — str.lower() + 'in' operator.
No regex; literal substring match is sufficient and fast.
"""
if not ua:
return False
lower = ua.lower()
return any(pattern in lower for pattern in AI_BOT_PATTERNS)2. @app.before_request() — global middleware
@app.before_request() with no path argument applies globally. The function must return a Request (pass) or Response (block). Also includes @app.after_request() to inject X-Robots-Tag on all passing responses — this hook only runs for requests that were not blocked.
# app.py — Robyn app with @before_request global middleware
from robyn import Robyn, Request, Response, Headers
app = Robyn(__file__)
# @app.before_request() with no path argument is GLOBAL — fires for every route.
# The function receives a Request and must return either:
# - a Request object → pass through to the route handler
# - a Response object → send immediately, skip the route handler
# Returning None is NOT valid in Robyn middleware — always return one of the two.
@app.before_request()
async def bot_blocker(request: Request) -> Request | Response:
# Path guard: robots.txt must be reachable so bots can read Disallow rules.
if request.url.path == "/robots.txt":
return request # pass through
# CRITICAL: Robyn stores ALL headers in lowercase.
# request.headers.get("User-Agent") → None (Title-Case fails)
# request.headers.get("user-agent") → the value or None
ua = request.headers.get("user-agent") or ""
if is_ai_bot(ua):
# Block: return a Response — Robyn sends this and skips the route handler.
return Response(
status_code=403,
headers=Headers({
"content-type": "text/plain",
"x-robots-tag": "noai, noimageai",
}),
description="Forbidden",
)
# Pass: return the Request object — Robyn continues to the route handler.
return request
# @after_request injects X-Robots-Tag on ALL passing responses.
# This runs AFTER the route handler completes (only for requests that passed).
# Blocked requests (where before_request returned a Response) skip this hook.
@app.after_request()
async def inject_robots_tag(response: Response) -> Response:
response.headers["x-robots-tag"] = "noai, noimageai"
return response3. Route handlers
Route handlers are only reached when the middleware returns the Request object. Robyn handlers can return a dict (auto-serialised to JSON), a str, or a Response object.
# routes.py — route handlers (only reached when middleware passes the request)
from robyn import Robyn, Request, Response, Headers
def register_routes(app: Robyn) -> None:
@app.get("/robots.txt")
async def robots_txt(request: Request) -> str:
return """User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
"""
@app.get("/")
async def index(request: Request) -> dict:
return {"message": "Hello"}
@app.get("/api/data")
async def api_data(request: Request) -> dict:
return {"data": "value"}4. Scoped middleware — SubRouter
SubRouter groups routes under a prefix with their own middleware stack. app.include_router(api) mounts the sub-router onto the main app. Public routes on the root app are unaffected by sub-router middleware.
# Scoped middleware — protect /api routes using SubRouter.
# SubRouter lets you group routes under a prefix with their own middleware.
from robyn import Robyn, Request, Response, Headers
from robyn.router import SubRouter
app = Robyn(__file__)
# Public routes — no bot blocking
@app.get("/robots.txt")
async def robots_txt(request: Request) -> str:
return "User-agent: *\nAllow: /\n"
@app.get("/")
async def index(request: Request) -> dict:
return {"message": "Hello"}
# Protected API sub-router
api = SubRouter(__name__, prefix="/api")
@api.before_request()
async def api_bot_blocker(request: Request) -> Request | Response:
ua = request.headers.get("user-agent") or ""
if is_ai_bot(ua):
return Response(
status_code=403,
headers=Headers({"x-robots-tag": "noai, noimageai"}),
description="Forbidden",
)
return request
@api.get("/data")
async def api_data(request: Request) -> dict:
return {"data": "value"}
# Register sub-router on the main app
app.include_router(api)5. Install and run
# Install and run
pip install robyn
# Run with default single worker
python app.py
# Run with multiple processes and workers (built-in — no Gunicorn needed)
python app.py --processes 2 --workers 4 --port 8080
# Development mode with auto-reload
python app.py --devKey points
- Headers are lowercase — always: Robyn normalises all incoming header names to lowercase internally.
request.headers.get("user-agent")works;request.headers.get("User-Agent")returnsNone. This differs from Flask and FastAPI where lookups are case-insensitive via theEnvironHeadersor StarletteHeadersclass. - Return
Requestto pass,Responseto block — neverNone: In Flask,@before_requestreturnsNoneto pass. In Robyn, you must explicitly return theRequestobject. ReturningNonefrom Robyn middleware is an error. - Robyn is not asyncio — it uses a Rust event loop: Although you can write
async defhandlers, Robyn runs them on its own Rust-backed runtime (PyO3). Standard asyncio libraries (asyncio.sleep, aiohttp) may not integrate transparently. For I/O-bound work, use Robyn's native async support or sync handlers. @after_requestdoes not run on blocked responses: When@before_requestreturns aResponse, Robyn short-circuits the request — the route handler and@after_requesthooks are both skipped. If you need to inject headers on the 403 response, set them in the@before_requesthandler itself.- Multi-process support is built in — no Gunicorn needed:
python app.py --processes 2 --workers 4spawns multiple OS-level processes backed by Rust workers. This is the Robyn equivalent of Gunicorn's worker model but without an external process manager. @app.before_request()with no args is global; with a path string it is scoped:@app.before_request("/api")would scope the middleware to/apionly. Without an argument it fires for every request on the app.
Framework comparison — Python web frameworks
| Framework | Middleware | Block | UA access |
|---|---|---|---|
| Robyn | @app.before_request() | return Response(status_code=403, ...) | request.headers.get("user-agent") (lowercase only) |
| Flask | @app.before_request | return make_response("Forbidden", 403) | request.headers.get("User-Agent") (case-insensitive) |
| FastAPI | @app.middleware("http") | return Response(status_code=403) | request.headers.get("user-agent") (case-insensitive) |
| Sanic | @app.middleware("request") | return HTTPResponse("Forbidden", 403) | request.headers.get("User-Agent") (case-insensitive) |
Robyn is the only framework in this table that requires exact lowercase header keys — all others normalise case internally. The pass/block return convention also differs: Flask returns None to pass; Robyn returns the Request object. FastAPI and Sanic both use return Response(...) to block, consistent with Robyn — but their UA header access is case-insensitive, unlike Robyn.