How to Block AI Bots on Pelican: Complete 2026 Guide
Pelican is the most popular Python static site generator — used for developer blogs, documentation, and content-heavy sites. It generates a static output directory that can be deployed anywhere. Bot blocking splits across the content layer (Pelican templates, static files, plugins) and the hosting platform layer.
Contents
robots.txt via EXTRA_PATH_METADATA
Pelican copies files listed in STATIC_PATHS to the output directory. The cleanest approach is EXTRA_PATH_METADATA — it lets you place the file anywhere in your project and map it to a specific output path:
pelicanconf.py
# pelicanconf.py
# Tell Pelican to copy the 'extra/' directory
STATIC_PATHS = ['images', 'extra']
# Map extra/robots.txt → output/robots.txt
EXTRA_PATH_METADATA = {
'extra/robots.txt': {'path': 'robots.txt'},
}extra/robots.txt
User-agent: *
Allow: /
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: YouBot
Disallow: /
Sitemap: https://example.com/sitemap.xmlAfter pelican content, confirm output/robots.txt exists.
robots.txt in your theme's static/ directory. Pelican copies the entire theme static/ to output/theme/ — but this puts it at output/theme/robots.txt, not the root. Use EXTRA_PATH_METADATA to ensure it lands at output/robots.txt.Dynamic robots.txt via plugin
For environment-based robots.txt (strict in production, permissive in staging), write a small Pelican plugin using the finalized signal:
plugins/robots_generator.py
import os
from pelican import signals
AI_BOTS = """
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
"""
PERMISSIVE = """
User-agent: GPTBot
Allow: /
"""
def generate_robots(pelican):
is_prod = os.environ.get("PELICAN_ENV") == "production"
siteurl = pelican.settings.get("SITEURL", "https://example.com")
content = f"User-agent: *\nAllow: /\n"
content += AI_BOTS if is_prod else PERMISSIVE
content += f"\nSitemap: {siteurl}/sitemap.xml\n"
output_path = pelican.settings.get("OUTPUT_PATH", "output")
robots_path = os.path.join(output_path, "robots.txt")
with open(robots_path, "w") as f:
f.write(content)
def register():
signals.finalized.connect(generate_robots)pelicanconf.py — register plugin
# pelicanconf.py
PLUGIN_PATHS = ['plugins']
PLUGINS = ['robots_generator']
# Or using pelican-plugins package:
# PLUGINS = ['pelican.plugins.robots_generator']Build commands
# Production — AI bots blocked
PELICAN_ENV=production pelican content
# Development — permissive
pelican contentrobots.txt ensures the output directory exists before you write. Earlier signals like initialized fire before the output directory is created.noai meta tag in base.html
Pelican themes use Jinja2 templates. The base template (base.html) wraps all pages — add the noai meta tag there with a per-article/page fallback chain:
themes/[theme]/templates/base.html
<!DOCTYPE html>
<html lang="{{ DEFAULT_LANG }}">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>{% block title %}{{ SITENAME }}{% endblock %}</title>
{# AI bot meta tag — per-article/page override via metadata field #}
{% if article is defined and article.robots %}
<meta name="robots" content="{{ article.robots }}">
{% elif page is defined and page.robots %}
<meta name="robots" content="{{ page.robots }}">
{% else %}
<meta name="robots" content="noai, noimageai">
{% endif %}
{% block head %}{% endblock %}
</head>
<body>
{% block content %}{% endblock %}
</body>
</html>templates/base.html. If using a third-party theme, copy it to a local directory first: cp -r $(pelican-themes -l | grep mytheme) ./themes/mytheme and set THEME = 'themes/mytheme' in pelicanconf.py. Never edit themes from the system-wide install path — they get overwritten on update.Simpler approach — no per-article override
If you just need the same tag on every page:
<meta name="robots" content="noai, noimageai">Using a Pelican setting for the default value
{# pelicanconf.py: ROBOTS_META = "noai, noimageai" #}
<meta name="robots" content="{{ ROBOTS_META | default('noai, noimageai') }}">Per-article/page override
Pelican reads metadata from the header of each content file. Add a robots field to override the default:
RST article — default (no robots field)
My Article Title
================
:date: 2026-01-01
:category: Blog
:tags: python, pelican
Article content here. Default robots meta applies: "noai, noimageai".RST article — allow indexing but no AI training
My Public Article
=================
:date: 2026-01-01
:robots: index, follow, noai, noimageai
Article content.RST article — allow everything
Landing Page
============
:date: 2026-01-01
:robots: index, follow
Content.Markdown article (requires markdown metadata extension)
Title: My Article
Date: 2026-01-01
robots: index, follow, noai, noimageai
Article content here.markdown in MARKDOWN['extensions'] in pelicanconf.py — it's enabled by default in most Pelican setups.X-Robots-Tag via hosting platform
X-Robots-Tag is an HTTP response header — Pelican generates static files. Add it at the hosting layer.
Netlify — netlify.toml
[build]
command = "pelican content -s pelicanconf.py"
publish = "output"
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"Vercel — vercel.json
{
"buildCommand": "pelican content -s pelicanconf.py",
"outputDirectory": "output",
"headers": [
{
"source": "/(.*)",
"headers": [
{
"key": "X-Robots-Tag",
"value": "noai, noimageai"
}
]
}
]
}Cloudflare Pages — via EXTRA_PATH_METADATA
Cloudflare Pages reads a _headers file from the root of the published directory. Map it via EXTRA_PATH_METADATA:
pelicanconf.py
STATIC_PATHS = ['images', 'extra']
EXTRA_PATH_METADATA = {
'extra/robots.txt': {'path': 'robots.txt'},
'extra/_headers': {'path': '_headers'},
}extra/_headers
/*
X-Robots-Tag: noai, noimageaiGitHub Pages
noai meta tag in base.html is your only option. For X-Robots-Tag or hard 403 blocking, migrate to Netlify, Vercel, or Cloudflare Pages.Hard 403 via edge functions
Netlify Edge Function
Create netlify/edge-functions/block-ai-bots.ts:
import type { Context } from '@netlify/edge-functions';
const AI_BOTS = [
'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
'Google-Extended', 'AhrefsBot', 'Bytespider',
'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
'PerplexityBot', 'YouBot',
];
export default async function handler(
request: Request,
context: Context
): Promise<Response> {
const ua = request.headers.get('user-agent') || '';
const isBot = AI_BOTS.some((bot) =>
ua.toLowerCase().includes(bot.toLowerCase())
);
if (isBot) {
return new Response('Forbidden', { status: 403 });
}
return context.next();
}
export const config = { path: '/*' };Register in netlify.toml:
[[edge_functions]]
path = "/*"
function = "block-ai-bots"Cloudflare Pages Functions
Create functions/_middleware.ts at project root:
import type { PagesFunction } from '@cloudflare/workers-types';
const AI_BOTS = [
'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
'Google-Extended', 'AhrefsBot', 'Bytespider',
'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
'PerplexityBot', 'YouBot',
];
export const onRequest: PagesFunction = async (context) => {
const ua = context.request.headers.get('user-agent') || '';
const isBot = AI_BOTS.some((bot) =>
ua.toLowerCase().includes(bot.toLowerCase())
);
if (isBot) {
return new Response('Forbidden', { status: 403 });
}
return context.next();
};Deployment quick-reference
| Platform | Build command | Publish dir | Custom headers | Edge functions |
|---|---|---|---|---|
| Netlify | pelican content | output | ✅ netlify.toml | ✅ netlify/edge-functions/ |
| Vercel | pelican content | output | ✅ vercel.json | ⚠️ Next.js required |
| Cloudflare Pages | pelican content | output | ✅ extra/_headers via EXTRA_PATH_METADATA | ✅ functions/_middleware.ts |
| GitHub Pages | CI: pelican content | output | 🚫 No | 🚫 No |
Full pelicanconf.py
# pelicanconf.py
AUTHOR = 'Your Name'
SITENAME = 'My Site'
SITEURL = 'https://example.com'
PATH = 'content'
TIMEZONE = 'UTC'
DEFAULT_LANG = 'en'
# Theme
THEME = 'themes/mytheme'
# Static files — include extra/ for robots.txt and _headers
STATIC_PATHS = ['images', 'extra']
EXTRA_PATH_METADATA = {
'extra/robots.txt': {'path': 'robots.txt'},
'extra/_headers': {'path': '_headers'}, # for Cloudflare Pages
}
# Plugins
PLUGIN_PATHS = ['plugins']
PLUGINS = [] # add 'robots_generator' for dynamic robots.txt
# Custom settings readable in templates
ROBOTS_META = 'noai, noimageai'
# Feed settings
FEED_ALL_ATOM = 'feeds/all.atom.xml'
CATEGORY_FEED_ATOM = 'feeds/{slug}.atom.xml'
# URL structure
ARTICLE_URL = '{category}/{slug}/'
ARTICLE_SAVE_AS = '{category}/{slug}/index.html'
PAGE_URL = '{slug}/'
PAGE_SAVE_AS = '{slug}/index.html'FAQ
How do I add robots.txt to a Pelican site?
Use EXTRA_PATH_METADATA in pelicanconf.py: create extra/robots.txt, add 'extra' to STATIC_PATHS, and map "extra/robots.txt": {"path": "robots.txt"}. This copies it to output/robots.txt at the site root.
How do I add the noai meta tag to every Pelican page?
Edit themes/[theme]/templates/base.html. Add a Jinja2 conditional that reads article.robots or page.robots with a fallback of noai, noimageai. Copy the theme to a local directory first — never edit system-wide theme files.
How do I override robots on a specific Pelican article?
Add :robots: index, follow, noai, noimageai to the RST article header, or robots: ... to the Markdown metadata block. The template reads article.robots and falls back to the global default.
How do I add X-Robots-Tag to a Pelican site?
At the hosting layer. Netlify: netlify.toml. Vercel: vercel.json. Cloudflare Pages: extra/_headers mapped via EXTRA_PATH_METADATA.
Can I write a Pelican plugin to generate robots.txt dynamically?
Yes — connect to pelican.signals.finalized (fires after all output is written) and write robots.txt to the output path. Register with PLUGINS = ['robots_generator'] in pelicanconf.py.
Where is the base.html template in Pelican?
In your theme: themes/[theme-name]/templates/base.html. Copy the theme to a local path before editing. Set THEME = 'themes/mytheme' in pelicanconf.py. Hugo's lookup order doesn't apply here — you edit the theme file directly.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.