Skip to content
Python · Wagtail CMS · Django

How to Block AI Bots on Wagtail

Wagtail is the most popular Django-based CMS — used by NASA, Google, Mozilla, and thousands of agencies worldwide. It sits on top of Django's request/response pipeline and adds its own page-serving layer with hooks. This means you have two complementary places to block AI crawlers: standard Django middleware (for all requests) and Wagtail's before_serve_page hook (for CMS-managed pages, with access to the page object). This guide covers all four protection layers: robots.txt, noai meta tags in templates, X-Robots-Tag headers, and hard 403 blocking — with Wagtail-specific patterns throughout.

9 min readUpdated April 2026Wagtail 5.x / 6.x

1. robots.txt

Wagtail doesn't include built-in robots.txt handling — you wire it up through Django's URL routing. The two most common approaches are a Django TemplateView (flexible, supports dynamic content) and serving a static file directly via Nginx (faster, no Django overhead).

Option A: TemplateView (recommended)

Create templates/robots.txt in your templates directory:

# Block all AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Diffbot
Disallow: /

# Allow standard search crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Wire it in your root urls.py:

# myproject/urls.py
from django.urls import path, include
from django.views.generic import TemplateView
from wagtail import urls as wagtail_urls

urlpatterns = [
    # robots.txt — must come before the catch-all wagtail_urls
    path(
        "robots.txt",
        TemplateView.as_view(
            template_name="robots.txt",
            content_type="text/plain; charset=utf-8",
        ),
    ),

    # ... your other URL patterns ...
    path("", include(wagtail_urls)),
]
URL order matters: The robots.txt path must appear before the Wagtail catch-all URL pattern (include(wagtail_urls)). Wagtail's catch-all will intercept /robots.txt and return a 404 if it's listed first and no Wagtail page exists at that path.

Option B: Dynamic view (environment-aware)

For different robots.txt content in staging vs production, use a view function in myapp/views.py:

from django.conf import settings
from django.http import HttpResponse

AI_BOTS = """User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /
"""

def robots_txt(request):
    if settings.DEBUG:
        # Block everything on staging/local
        content = "User-agent: *\nDisallow: /\n"
    else:
        content = AI_BOTS
    return HttpResponse(content, content_type="text/plain; charset=utf-8")
# urls.py
from myapp.views import robots_txt

urlpatterns = [
    path("robots.txt", robots_txt),
    # ...
]

Option C: Nginx static file (fastest)

For high-traffic sites, bypass Django entirely and let Nginx serve robots.txt:

# /etc/nginx/sites-available/mysite
server {
    listen 80;
    server_name example.com;

    # Serve robots.txt without hitting Gunicorn/Django
    location = /robots.txt {
        alias /var/www/mysite/robots.txt;
        add_header Content-Type text/plain;
        access_log off;
    }

    location / {
        proxy_pass http://127.0.0.1:8000;
        # ... proxy headers ...
    }
}

Place your robots.txt file at /var/www/mysite/robots.txt and reload Nginx. This approach adds zero latency to your Django application server.

2. noai meta tags in Wagtail templates

Wagtail pages render through Django templates (or optionally Jinja2). The standard pattern is a base.html template that all page templates extend. Add the noai meta tag here for site-wide coverage.

Hardcoded site-wide tag

The simplest approach — edit templates/base.html:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  {# AI bot protection — apply to every page #}
  <meta name="robots" content="noai, noimageai">

  <title>{% block title %}{{ page.seo_title|default:page.title }}{% endblock %}</title>

  {# Wagtail's built-in search description meta #}
  {% if page.search_description %}
  <meta name="description" content="{{ page.search_description }}">
  {% endif %}

  {% block extra_head %}{% endblock %}
</head>
<body>
  {% block content %}{% endblock %}
</body>
</html>

Per-page override via template variable

If you want editors to control robots per page (see Section 3 for the Page model setup):

{# In templates/base.html #}
<meta name="robots" content="{{ page.robots_tag|default:'noai, noimageai' }}">

With this pattern, the default is noai, noimageai unless an editor has explicitly set a different value in the Wagtail admin.

Jinja2 syntax

If your Wagtail project uses Jinja2 templates (configured via TEMPLATES[x]['BACKEND'] = 'django.template.backends.jinja2.Jinja2'):

{# jinja2/base.html #}
<meta name="robots" content="{{ page.robots_tag if page.robots_tag else 'noai, noimageai' }}">
Wagtail's built-in SEO fields: Wagtail's base Page model already includes seo_title, search_description, and search_image in its promote_panels. Adding a robots_tag field extends this pattern naturally — editors manage SEO signals in one place.

3. Per-page robots control via Page model

Wagtail's CMS gives editors control over each page. Adding a robots_tag field to your base page model lets them set per-page AI bot instructions from the Wagtail admin — no code changes needed per page.

Adding the field to your base page model

In Wagtail, most projects have a base page model that all page types inherit from. Add the field there so it's available on every page type:

# myapp/models.py
from django.db import models
from wagtail.models import Page
from wagtail.admin.panels import FieldPanel, MultiFieldPanel


class BasePage(Page):
    """Abstract base page with AI bot robots control."""

    robots_tag = models.CharField(
        max_length=200,
        blank=True,
        default="",
        help_text=(
            "robots meta tag value. Leave blank to use the site default (noai, noimageai). "
            "Examples: 'noai, noimageai' · 'index, follow' · 'noindex, nofollow'"
        ),
    )

    # Add to promote_panels so it appears in the Promote tab
    promote_panels = Page.promote_panels + [
        MultiFieldPanel(
            [
                FieldPanel("robots_tag"),
            ],
            heading="AI & Bot Control",
        ),
    ]

    class Meta:
        abstract = True


# Example page type inheriting from BasePage
class BlogPage(BasePage):
    body = models.TextField()

    content_panels = Page.content_panels + [
        FieldPanel("body"),
    ]

    class Meta:
        verbose_name = "Blog Page"

Apply the migration:

python manage.py makemigrations myapp
python manage.py migrate

After migration, the Wagtail editor shows an “AI & Bot Control” section in the Promote tab of every page. Editors can leave it blank (site default applies) or enter a custom value such as index, follow for pages that should be indexed by AI search engines.

Existing page types: If your project already has page types inheriting directly from Wagtail's Page, you can either: (a) add an abstract BasePage as shown above and update all page types to inherit from it, or (b) add robots_tag directly to each page type individually. Option (a) is cleaner at scale.

Template integration

Update base.html to use the field:

{# templates/base.html #}
{% if page %}
  <meta name="robots" content="{{ page.robots_tag|default:'noai, noimageai' }}">
{% else %}
  {# Non-page Django views (admin, custom views) — use site default #}
  <meta name="robots" content="noai, noimageai">
{% endif %}

The {% if page %} guard handles non-Wagtail views (Django admin, custom views) where the page context variable isn't set.

4. X-Robots-Tag via Wagtail hooks

HTTP headers are more reliable than meta tags — some AI crawlers parse headers without rendering HTML. Wagtail's hook system lets you modify responses for all CMS-managed pages from a central location.

after_serve_page hook

The after_serve_page hook fires after Wagtail has rendered a page and produced a response. Use it to add headers:

# myapp/wagtail_hooks.py
from wagtail import hooks


@hooks.register("after_serve_page")
def add_robots_header(page, request, response):
    """Add X-Robots-Tag to all Wagtail page responses."""

    # Use per-page override if set, otherwise use site default
    robots_value = getattr(page, "robots_tag", "") or "noai, noimageai"
    response["X-Robots-Tag"] = robots_value
    return response
Hook auto-discovery: Wagtail automatically discovers wagtail_hooks.py in any app listed in INSTALLED_APPS. No manual import or registration is needed — just ensure your app is in INSTALLED_APPS.

Consistent meta tag + header

Use the hook to keep the header value consistent with the per-page field:

@hooks.register("after_serve_page")
def add_robots_header(page, request, response):
    """Mirror the robots meta tag in the X-Robots-Tag header."""
    robots_value = getattr(page, "robots_tag", "") or "noai, noimageai"

    # Only set on HTML responses
    content_type = response.get("Content-Type", "")
    if "text/html" in content_type:
        response["X-Robots-Tag"] = robots_value

    return response

This ensures the HTTP header matches whatever the editor set (or the site default), keeping robots.txt, meta tags, and response headers consistent.

5. Hard 403 blocking

The most effective protection: reject AI crawler requests before Wagtail renders the page. The bot receives a 403 response with no content to train on.

before_serve_page hook

The before_serve_page hook fires before Wagtail renders any CMS-managed page. Returning an HttpResponse from the hook short-circuits page serving:

# myapp/wagtail_hooks.py
from django.http import HttpResponse
from wagtail import hooks

AI_BOT_PATTERNS = [
    "GPTBot",
    "ClaudeBot",
    "Claude-Web",
    "anthropic-ai",
    "CCBot",
    "Google-Extended",
    "PerplexityBot",
    "Applebot-Extended",
    "Amazonbot",
    "meta-externalagent",
    "Bytespider",
    "Diffbot",
    "YouBot",
    "cohere-ai",
]


def is_ai_bot(request):
    """Check if the request User-Agent matches a known AI crawler."""
    ua = request.META.get("HTTP_USER_AGENT", "")
    return any(pattern.lower() in ua.lower() for pattern in AI_BOT_PATTERNS)


@hooks.register("before_serve_page")
def block_ai_bots(page, request, serve_args, serve_kwargs):
    """Block AI crawlers before Wagtail renders the page."""
    if is_ai_bot(request):
        return HttpResponse("Forbidden", status=403)
    # Return None to let Wagtail proceed with page rendering
    return None
Return None to continue: The before_serve_page hook must return None (or nothing) to let Wagtail continue serving the page. Any HttpResponse object returned from the hook is used as the final response — Wagtail stops processing immediately.

Logging blocked requests

Add logging to monitor AI crawler activity:

import logging

logger = logging.getLogger(__name__)


@hooks.register("before_serve_page")
def block_ai_bots(page, request, serve_args, serve_kwargs):
    if is_ai_bot(request):
        ua = request.META.get("HTTP_USER_AGENT", "unknown")
        logger.warning(
            "AI bot blocked: %s | page: %s | UA: %s",
            request.path,
            page.slug,
            ua,
        )
        return HttpResponse("Forbidden", status=403)
    return None

Configure Django logging in settings.py:

LOGGING = {
    "version": 1,
    "handlers": {
        "console": {"class": "logging.StreamHandler"},
    },
    "loggers": {
        "myapp": {"handlers": ["console"], "level": "WARNING"},
    },
}

Combined hooks file

A complete wagtail_hooks.py with both X-Robots-Tag and hard 403:

# myapp/wagtail_hooks.py
import logging
from django.http import HttpResponse
from wagtail import hooks

logger = logging.getLogger(__name__)

AI_BOT_PATTERNS = [
    "GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
    "CCBot", "Google-Extended", "PerplexityBot", "Applebot-Extended",
    "Amazonbot", "meta-externalagent", "Bytespider", "Diffbot",
    "YouBot", "cohere-ai",
]


def is_ai_bot(request):
    ua = request.META.get("HTTP_USER_AGENT", "")
    return any(p.lower() in ua.lower() for p in AI_BOT_PATTERNS)


@hooks.register("before_serve_page")
def block_ai_bots(page, request, serve_args, serve_kwargs):
    if is_ai_bot(request):
        logger.warning("AI bot blocked: %s | UA: %s", request.path,
                       request.META.get("HTTP_USER_AGENT", ""))
        return HttpResponse("Forbidden", status=403)
    return None


@hooks.register("after_serve_page")
def add_robots_header(page, request, response):
    robots_value = getattr(page, "robots_tag", "") or "noai, noimageai"
    content_type = response.get("Content-Type", "")
    if "text/html" in content_type:
        response["X-Robots-Tag"] = robots_value
    return response

6. Django middleware approach

Wagtail hooks only fire for CMS-managed page requests. To protect all requests — including custom Django views, admin routes, and API endpoints — use Django middleware. Middleware runs before Wagtail's routing layer.

AI bot middleware

Create myapp/middleware.py:

# myapp/middleware.py

AI_BOT_PATTERNS = [
    "GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
    "CCBot", "Google-Extended", "PerplexityBot", "Applebot-Extended",
    "Amazonbot", "meta-externalagent", "Bytespider", "Diffbot",
    "YouBot", "cohere-ai",
]

# Paths that should bypass bot protection
EXEMPT_PATHS = ["/robots.txt", "/sitemap.xml", "/favicon.ico"]


class AIBotMiddleware:
    """Block AI training crawlers at the Django middleware level."""

    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        path = request.path_info

        # Skip protection for essential crawler files
        if any(path.startswith(exempt) for exempt in EXEMPT_PATHS):
            return self.get_response(request)

        ua = request.META.get("HTTP_USER_AGENT", "")
        if self._is_ai_bot(ua):
            from django.http import HttpResponse
            return HttpResponse("Forbidden", status=403)

        response = self.get_response(request)
        response["X-Robots-Tag"] = "noai, noimageai"
        return response

    @staticmethod
    def _is_ai_bot(ua: str) -> bool:
        ua_lower = ua.lower()
        return any(p.lower() in ua_lower for p in AI_BOT_PATTERNS)

Register the middleware in settings.py:

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "myapp.middleware.AIBotMiddleware",  # Add early in the chain
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.middleware.common.CommonMiddleware",
    # ... rest of middleware ...
    "wagtail.middleware.SiteMiddleware",
]
Middleware vs hooks: If you implement both the Django middleware and the Wagtail before_serve_page hook, the middleware runs first (and will block the request before the hook fires). Avoid duplicating logic — use middleware for site-wide blocking, or hooks for page-specific logic with access to the page object. Pick one approach and stick with it.

Exempt the Wagtail admin

If your team uses Wagtail's admin interface, ensure admin requests are not blocked:

EXEMPT_PATHS = [
    "/robots.txt",
    "/sitemap.xml",
    "/favicon.ico",
    "/admin/",       # Django admin
    "/cms/",         # Wagtail admin (default: /admin/ or /cms/ depending on your urls.py)
]

Check your urls.py for the Wagtail admin path — it's configured bypath("cms/", include(wagtailadmin_urls)) or similar.

7. Deployment comparison

Wagtail is a WSGI application — deployed via Gunicorn or uWSGI, typically behind Nginx. All four protection layers work regardless of deployment target because blocking happens in Python code. The differences are in how robots.txt is served most efficiently.

Platformrobots.txtMeta tagsX-Robots-TagHard 403
Nginx + Gunicorn
Wagtail Cloud
Heroku
Railway
Docker / self-hosted
Fly.io

Because Wagtail runs a Python application server on all deployment targets, all protection layers work universally. The Nginx + Gunicorn setup gives you the additional option of serving robots.txt directly at the Nginx level (zero Python overhead), which is useful at high traffic volumes.

Nginx robots.txt — production setup

# /etc/nginx/sites-available/mysite.conf
upstream wagtail {
    server 127.0.0.1:8000;
}

server {
    listen 443 ssl;
    server_name example.com;

    # Serve robots.txt without touching Gunicorn
    location = /robots.txt {
        root /var/www/mysite/public;
        add_header Content-Type "text/plain; charset=utf-8";
        expires 1d;
        access_log off;
    }

    # All other requests go to Wagtail/Gunicorn
    location / {
        proxy_pass http://wagtail;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Wagtail Cloud

Wagtail Cloud is the managed hosting platform for Wagtail. All Django middleware and Wagtail hooks work identically — no platform-specific configuration needed. Deploy your code, and the AI bot protection is active immediately.

FAQ

How do I serve robots.txt in Wagtail?

The simplest approach is a Django TemplateView. Create templates/robots.txt, add it to your TEMPLATES dirs list, then wire it in your root urls.py: path("robots.txt", TemplateView.as_view(template_name="robots.txt", content_type="text/plain")). Place the robots.txt path before the Wagtail catch-all URL pattern. For environment-aware output, use a custom view function that checks settings.DEBUG. For high-traffic sites, configure Nginx to serve robots.txt as a static file — faster and avoids a Django request.

How do I add noai meta tags to every Wagtail page?

Edit your base.html template and add <meta name="robots" content="{{ page.robots_tag|default:'noai, noimageai' }}"> inside <head>. If you add a robots_tag CharField to your base Page model with a blank default, editors can override it per page in the Wagtail admin. Without the field, use a hardcoded default: <meta name="robots" content="noai, noimageai">.

What is the Wagtail before_serve_page hook?

The before_serve_page hook fires before Wagtail renders any CMS-managed page. Register it in wagtail_hooks.py with @hooks.register("before_serve_page"). The hook receives the request, page, and serve args/kwargs. If the function returns an HttpResponse, Wagtail uses that response instead of rendering the page — return HttpResponse("Forbidden", status=403) to block the request. Return None to let Wagtail proceed.

How is Wagtail different from plain Django for AI bot blocking?

Plain Django handles all routing via urls.py and view functions. Wagtail intercepts page requests through its own routing layer before they reach custom Django views. This gives you two complementary interception points: Django middleware (runs for all requests — admin, API, custom views, Wagtail pages) and Wagtail hooks (run only for Wagtail-managed page requests, with access to the page object for per-page logic). Use middleware for broad protection and hooks for page-specific behavior.

How do I give editors per-page control over AI bot indexing?

Add a CharField named robots_tag to your base Page model with blank=True, default="". Add it to promote_panels so it appears in the Promote tab of the editor alongside Slug and SEO title. In your base template, use {{ page.robots_tag|default:'noai, noimageai' }}. Run makemigrations and migrate after adding the field. Editors can then set values like index, follow for pages that should be indexed by AI search engines.

Does blocking AI bots in Wagtail affect Googlebot?

No — if implemented correctly. The robots.txt Disallow rules and User-Agent checks in this guide target AI training crawlers specifically: GPTBot, ClaudeBot, CCBot, Google-Extended, PerplexityBot, and others. Googlebot and Bingbot use different user agent strings and are not matched. Always include explicit Allow rules for Googlebot and Bingbot in your robots.txt to make your intent clear.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.