Skip to content
Guides/Tornado (Python)

How to Block AI Bots on Tornado (Python): Complete 2026 Guide

Tornado is Python's original async web framework — built before asyncio, used by Jupyter Notebook, and still a top choice for WebSocket servers and long-polling APIs. Unlike FastAPI and Starlette, Tornado has no middleware stack. Bot blocking uses a BaseHandler class with a prepare() override that all route handlers inherit.

Tornado has no middleware — use BaseHandler

Tornado's design is class-based, not middleware-based. The idiomatic approach is to create a BaseHandler(RequestHandler) that overrides prepare(), and have every route handler in your app extend it. prepare() runs before get(), post(), or any HTTP verb handler — it is Tornado's per-request entry point, equivalent to middleware in Flask, Express, or Gin.

Protection layers

1
robots.txtStaticFileHandler route at /robots.txt — registered before BaseHandler routes so it bypasses bot blocking
2
noai meta tagBaseHandler.get_template_namespace() injects robots var into all templates — default "noai, noimageai"
3
X-Robots-Tag headerself.set_header("X-Robots-Tag", "noai, noimageai") in BaseHandler.prepare() for all non-blocked requests
4
Hard 403 blockself.send_error(403) in prepare() — raises Finish exception, get()/post() never runs

Layer 1: robots.txt

Register /robots.txt using StaticFileHandler as the first route in your application. Tornado matches routes in order — placing it first ensures static files are served before any bot-blocking logic.

# static/robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Applebot-Extended
User-agent: PerplexityBot
User-agent: Diffbot
User-agent: cohere-ai
User-agent: FacebookBot
User-agent: omgili
User-agent: omgilibot
User-agent: Amazonbot
User-agent: DeepSeekBot
User-agent: MistralBot
User-agent: xAI-Bot
User-agent: AI2Bot
Disallow: /
import tornado.web

app = tornado.web.Application([
    # Static files — registered FIRST so they bypass BaseHandler
    (r"/robots.txt", tornado.web.StaticFileHandler, {"path": "static/robots.txt"}),
    (r"/static/(.*)", tornado.web.StaticFileHandler, {"path": "static/"}),

    # Your route handlers — all extend BaseHandler
    (r"/", HomeHandler),
    (r"/api/data", DataHandler),
])

Layer 2: noai meta tag

Override get_template_namespace() in BaseHandler to inject a robots variable into every template automatically:

BaseHandler — inject robots into template namespace

class BaseHandler(tornado.web.RequestHandler):
    def get_template_namespace(self):
        ns = super().get_template_namespace()
        ns['robots'] = getattr(self, '_robots', 'noai, noimageai')
        return ns

templates/base.html (Tornado uses Python string templates or Jinja2)

<meta name="robots" content="{{ robots }}">

Per-handler override

class PublicBlogHandler(BaseHandler):
    async def get(self):
        self._robots = 'index, follow'  # overrides default
        self.render('blog.html')

Layers 3 & 4: BaseHandler with prepare()

prepare() is called before every HTTP method handler. Override it in BaseHandler and have all your handlers extend it:

handlers/base.py

import tornado.web

AI_BOT_PATTERNS = [
    "gptbot", "chatgpt-user", "oai-searchbot",
    "claudebot", "anthropic-ai", "claude-web",
    "google-extended", "ccbot", "bytespider",
    "applebot-extended", "perplexitybot", "diffbot",
    "cohere-ai", "facebookbot", "meta-externalagent",
    "omgili", "omgilibot", "amazonbot",
    "deepseekbot", "mistralbot", "xai-bot", "ai2-bot",
]

EXEMPT_PATHS = {"/robots.txt", "/sitemap.xml", "/favicon.ico"}


class BaseHandler(tornado.web.RequestHandler):
    # Set to True in a specific handler to bypass bot blocking
    ALLOW_AI_BOTS: bool = False

    def prepare(self):
        # Exempt paths always pass through
        if self.request.path in EXEMPT_PATHS:
            return

        # Per-handler opt-out
        if self.ALLOW_AI_BOTS:
            return

        ua = self.request.headers.get("User-Agent", "").lower()

        for pattern in AI_BOT_PATTERNS:
            if pattern in ua:
                # Layer 4: hard 403 block
                # send_error() writes status, calls finish(), raises Finish
                # The get()/post() method never runs after this
                self.send_error(403)
                return

        # Layer 3: set X-Robots-Tag for all legitimate requests
        self.set_header("X-Robots-Tag", "noai, noimageai")

Key points

  • Blocking: self.send_error(403) writes the HTTP 403 response and raises a tornado.web.Finish exception that stops handler execution. The get() or post() method is never called. The return after it is a safety guard in case Finish is caught upstream.
  • Reading User-Agent: self.request.headers.get("User-Agent", "") HTTPHeaders is case-insensitive. The empty string default avoids AttributeError on .lower() when bots omit the header entirely.
  • X-Robots-Tag: self.set_header() queues the header for the response. Call it in prepare() before get()/post() runs — safe because Tornado buffers headers until finish() is called.
  • Per-handler opt-out: Set ALLOW_AI_BOTS = True on any handler class to bypass bot blocking for that route.

Route handlers extending BaseHandler

from handlers.base import BaseHandler


class HomeHandler(BaseHandler):
    async def get(self):
        # prepare() already ran — bot blocked or X-Robots-Tag set
        self.write("Hello, World!")


class ApiHandler(BaseHandler):
    async def get(self):
        self.set_header("Content-Type", "application/json")
        self.write('{"status": "ok"}')


class PublicFeedHandler(BaseHandler):
    # This route intentionally allows AI crawlers
    ALLOW_AI_BOTS = True

    async def get(self):
        self.write("Public RSS feed — AI bots welcome")


class WebSocketHandler(tornado.websocket.WebSocketHandler, BaseHandler):
    # WebSocket handlers can also extend BaseHandler for prepare() checks
    def open(self):
        self.write_message("Connected")


# Application setup
app = tornado.web.Application([
    (r"/robots.txt", tornado.web.StaticFileHandler, {"path": "static/robots.txt"}),
    (r"/", HomeHandler),
    (r"/api/data", ApiHandler),
    (r"/feed", PublicFeedHandler),
])

Async prepare() (Tornado 6+)

prepare() can be async — useful if you need to look up IP reputation or check a rate-limit store before deciding to block:

class BaseHandler(tornado.web.RequestHandler):
    async def prepare(self):
        """Async prepare — Tornado 6.1+ supports async prepare()."""
        if self.request.path in EXEMPT_PATHS:
            return

        ua = self.request.headers.get("User-Agent", "").lower()

        for pattern in AI_BOT_PATTERNS:
            if pattern in ua:
                self.send_error(403)
                return

        # Could await a Redis rate-limit check here
        # allowed = await self.settings["redis"].get(ip_key)

        self.set_header("X-Robots-Tag", "noai, noimageai")

Tornado calls await prepare() automatically when it detects a coroutine. No configuration change needed — just add async.

Comparison: Tornado vs FastAPI vs Django

Tornado — BaseHandler.prepare()

class BaseHandler(RequestHandler):
    def prepare(self):
        ua = self.request.headers.get("User-Agent", "").lower()
        if any(p in ua for p in AI_BOT_PATTERNS):
            self.send_error(403)

FastAPI / Starlette — BaseHTTPMiddleware

class AiBotBlocker(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        ua = request.headers.get("user-agent", "").lower()
        if any(p in ua for p in AI_BOT_PATTERNS):
            return Response("Forbidden", status_code=403)
        return await call_next(request)

Django — process_request() middleware

class AiBotBlockerMiddleware:
    def __init__(self, get_response): self.get_response = get_response
    def __call__(self, request):
        ua = request.META.get("HTTP_USER_AGENT", "").lower()
        if any(p in ua for p in AI_BOT_PATTERNS):
            return HttpResponseForbidden("Forbidden")
        return self.get_response(request)

All three patterns achieve the same result. Tornado's prepare() is the least obvious but idiomatic — it is how Tornado's own auth mixins (e.g., tornado.auth) work.

Running Tornado

# Install
pip install tornado

# app.py
import asyncio
import tornado.web
import tornado.ioloop
from handlers.base import BaseHandler

class HomeHandler(BaseHandler):
    async def get(self):
        self.write("Hello!")

def make_app():
    return tornado.web.Application([
        (r"/robots.txt", tornado.web.StaticFileHandler, {"path": "static/robots.txt"}),
        (r"/", HomeHandler),
    ])

if __name__ == "__main__":
    app = make_app()
    app.listen(8888)
    asyncio.get_event_loop().run_forever()

# Run
python app.py

# Production — run multiple processes (one per CPU core)
# tornado.process.fork_processes(0)  # 0 = one per CPU

Tornado runs its own IOLoop — no ASGI server (uvicorn, gunicorn) needed. For server-level blocking before Python runs, place nginx in front and use a map $http_user_agent block. See the nginx guide.

Verification

# Should return 403 (blocked AI bot)
curl -I -A "GPTBot" http://localhost:8888/

# Should return 200 (regular browser)
curl -I -A "Mozilla/5.0" http://localhost:8888/

# robots.txt must always return 200
curl -I -A "GPTBot" http://localhost:8888/robots.txt

# Check X-Robots-Tag on legitimate request
curl -si -A "Mozilla/5.0" http://localhost:8888/ | grep -i x-robots

Default Tornado port is 8888. Expected: GPTBot → 403. Mozilla/5.0 → 200 with X-Robots-Tag: noai, noimageai. robots.txt → 200 for any user agent.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.