Skip to content
Guides/Sphinx
Sphinx · Python · Documentation9 min read

How to Block AI Bots on Sphinx: Complete 2026 Guide

Sphinx generates a static HTML documentation site — no server process at runtime. Used by CPython, Django, NumPy, and thousands of open-source projects, it's the de facto standard for Python documentation. Bot blocking splits across the Sphinx build layer (conf.py, templates, _static/) and the hosting platform layer.

robots.txt via html_extra_path

Sphinx does not place files from _static/ at the output root — it copies them to _build/html/_static/. To get robots.txt at the root of your built site, use html_extra_path in conf.py.

Common mistake: Placing robots.txt in source/_static/ puts it at _build/html/_static/robots.txtnot at _build/html/robots.txt. Crawlers look for it at the root. Use html_extra_path instead.

conf.py

# conf.py
html_extra_path = ['robots.txt']

# Alternative: point to a directory
# html_extra_path = ['_extra']  # then create _extra/robots.txt

Create robots.txt in the same directory as conf.py (usually docs/robots.txt or source/robots.txt):

robots.txt

User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://docs.example.com/sitemap.xml

After make html or sphinx-build -b html source _build/html, verify the file exists at _build/html/robots.txt.

noai meta tag via _templates/layout.html

The most reliable way to add custom meta tags to every Sphinx page is to override the theme's base layout template. Create source/_templates/layout.html (the path must be relative to your Sphinx source directory, which must be registered in conf.py).

Register the templates directory in conf.py

# conf.py
templates_path = ['_templates']

source/_templates/layout.html (sphinx_rtd_theme / most themes)

{% extends "!layout.html" %}

{% block extrahead %}
  {{ super() }}
  <meta name="robots" content="noai, noimageai">
{% endblock %}
The ! prefix is critical: {% extends "!layout.html" %} tells Sphinx to use the original theme's layout as the parent. Without the !, Sphinx looks for layout.html in your _templates/ directory, causing infinite recursion. Always use the ! prefix when overriding theme templates.
Call {{ super() }}: Always call {{ super() }} in the extrahead block to preserve the theme's existing head content (favicons, OG tags, theme stylesheets). Omitting it may remove critical theme assets.

Furo theme — layout.html

{% extends "!layout.html" %}

{% block extrahead %}
  {{ super() }}
  <meta name="robots" content="noai, noimageai">
{% endblock %}

Same syntax — Furo also uses the extrahead block name. This template override works with sphinx_rtd_theme, Furo, PyData Sphinx Theme, and the default Sphinx alabaster theme.

After adding the template

make html
# or
sphinx-build -b html source _build/html

# Verify the meta tag is present in the output
grep 'noai' _build/html/index.html

Global meta via html_meta in conf.py

Sphinx supports a html_meta dict in conf.py to inject meta tags into all pages. Support varies by theme, but it works with most modern themes.

conf.py

# conf.py
html_meta = {
    'robots': 'noai, noimageai',
}
Theme compatibility: html_meta is processed by Sphinx core and added to the page's <head> via the theme's metatags block. It works reliably with alabaster and PyData Sphinx Theme. For sphinx_rtd_theme, the _templates/layout.html override is more reliable. Test with grep 'noai' _build/html/index.html to confirm.

Per-page robots directives

For per-page control, use the .. meta:: directive in individual RST files or the :robots: front matter key in MyST Markdown files.

RST — per-page meta directive

.. meta::
   :robots: noai, noimageai

My Page Title
=============

Page content here.

MyST Markdown — front matter (with myst-parser)

---
myst:
  html_meta:
    robots: "noai, noimageai"
---

# My Page Title

Page content here.

Override to allow everything on a specific page

.. meta::
   :robots: index, follow

Public Page
===========
Combining global + per-page: Set the global default in html_meta (conf.py) or _templates/layout.html, then use .. meta:: directives to override on specific pages. The per-page directive replaces the global value for that page.

X-Robots-Tag via hosting platform

X-Robots-Tag is an HTTP response header. Sphinx outputs static HTML files — no server to inject headers at runtime. Add the header at the hosting layer.

Netlify — netlify.toml

[build]
  command = "make html"
  publish = "_build/html"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Vercel — vercel.json

{
  "buildCommand": "make html",
  "outputDirectory": "_build/html",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}

Cloudflare Pages — _extra/_headers (via html_extra_path)

Cloudflare Pages reads a _headers file from the root of the published directory. Use html_extra_path to copy it there:

# conf.py
html_extra_path = ['robots.txt', '_headers']

Create _headers alongside conf.py:

/*
  X-Robots-Tag: noai, noimageai

Read the Docs specifics

Read the Docs (RTD) is the most common hosting platform for Sphinx documentation. It has specific capabilities and limitations for bot blocking.

.readthedocs.yaml

# .readthedocs.yaml
version: 2

build:
  os: ubuntu-22.04
  tools:
    python: "3.12"

sphinx:
  configuration: docs/conf.py

python:
  install:
    - requirements: docs/requirements.txt

RTD capabilities

FeatureRTD FreeRTD Business
noai meta tag (via template)✅ Yes✅ Yes
robots.txt via html_extra_path✅ Yes✅ Yes
Custom HTTP headers (X-Robots-Tag)🚫 No✅ Yes
Hard 403 UA blocking🚫 No⚠️ Limited
Custom domain✅ Yes✅ Yes
RTD recommendation: For free-tier RTD hosting, use _templates/layout.html for the noai meta tag and html_extra_path for robots.txt. These are your only options at the free tier. For X-Robots-Tag or hard 403 blocking, migrate to Netlify or Cloudflare Pages.

RTD addons — inject meta tags without template override

RTD Business accounts can inject custom HTML via the RTD addons system in .readthedocs.yaml. For free-tier projects, the template override is the only option.

Hard 403 via edge functions

For hard UA-based blocking (403 before any content is served), use an edge function. This requires hosting on Netlify or Cloudflare Pages.

Netlify Edge Function

Create netlify/edge-functions/block-ai-bots.ts in your project root (not inside the docs/ or Sphinx source directory):

import type { Context } from '@netlify/edge-functions';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export default async function handler(
  request: Request,
  context: Context
): Promise<Response> {
  const ua = request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
}

export const config = { path: '/*' };

Register it in netlify.toml:

[build]
  command = "make html"
  publish = "_build/html"

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Cloudflare Pages Functions

Create functions/_middleware.ts in your project root:

import type { PagesFunction } from '@cloudflare/workers-types';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
};

Cloudflare Pages build config (dashboard)

Build command: make html
Build output directory: _build/html

Deployment quick-reference

PlatformBuild commandPublish dirCustom headersEdge functions
Read the Docs (free)Auto (RTD builds)Auto🚫 No🚫 No
Read the Docs (Business)Auto (RTD builds)Auto✅ Yes⚠️ Limited
Netlifymake html_build/html✅ netlify.toml✅ netlify/edge-functions/
Vercelmake html_build/html✅ vercel.json⚠️ Next.js required
Cloudflare Pagesmake html_build/html✅ _headers via html_extra_path✅ functions/_middleware.ts
GitHub PagesCI: make html_build/html🚫 No🚫 No

Full conf.py example

# conf.py
import os
import sys

project = 'My Project'
author = 'My Team'
release = '1.0.0'

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.viewcode',
    'myst_parser',          # if using Markdown
]

templates_path = ['_templates']
html_extra_path = ['robots.txt']   # copied to _build/html/robots.txt

html_theme = 'furo'                # or 'sphinx_rtd_theme', 'pydata_sphinx_theme'

# Global meta tags (works with most themes)
html_meta = {
    'robots': 'noai, noimageai',
}

# Theme options (theme-specific)
html_theme_options = {}

# Static files (CSS, JavaScript, images) — goes to _build/html/_static/
html_static_path = ['_static']

Makefile (standard)

# Minimal Makefile for Sphinx documentation

SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SOURCEDIR     = source
BUILDDIR      = _build

.PHONY: help Makefile

%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

html: Makefile
	@$(SPHINXBUILD) -b html "$(SOURCEDIR)" "$(BUILDDIR)/html"
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

clean:
	rm -rf $(BUILDDIR)/*

FAQ

How do I add robots.txt to a Sphinx site?

Use html_extra_path in conf.py: html_extra_path = ['robots.txt']. Create robots.txt alongside conf.py. This copies it to _build/html/robots.txt — the root of your deployed site. Do not place it in _static/ — that copies to _build/html/_static/robots.txt, which crawlers will not find.

How do I add the noai meta tag to every Sphinx page?

Create source/_templates/layout.html with {% extends "!layout.html" %} and a extrahead block containing <meta name="robots" content="noai, noimageai">. Always call {{ super() }} in the block to preserve theme assets. Register with templates_path = ['_templates'] in conf.py.

Can I add the noai meta tag without overriding the theme layout?

Yes — use html_meta = {"robots": "noai, noimageai"} in conf.py. Works with most modern themes. Test by checking the built HTML: grep noai _build/html/index.html.

How do I add X-Robots-Tag on a Sphinx site hosted on Read the Docs?

RTD free tier does not support custom HTTP headers. Use the noai meta tag (via template override or html_meta) as your primary protection. For X-Robots-Tag, upgrade to RTD Business or migrate to Netlify, Vercel, or Cloudflare Pages.

How do I block AI bots with hard 403 on a Sphinx site?

Use a Netlify Edge Function or Cloudflare Pages functions/_middleware.ts that checks User-Agent and returns 403 for known AI crawlers. Not available on Read the Docs or GitHub Pages.

Does the Sphinx html_meta conf.py option add noai tags?

Yes, but with caveats. html_meta works reliably with alabaster and PyData Sphinx Theme. For sphinx_rtd_theme, the _templates/layout.html override is more reliable. Always verify with grep noai _build/html/index.html after building.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.