Skip to content
Guides/Pelican
Pelican · Python · Static Site Generator8 min read

How to Block AI Bots on Pelican: Complete 2026 Guide

Pelican is the most popular Python static site generator — used for developer blogs, documentation, and content-heavy sites. It generates a static output directory that can be deployed anywhere. Bot blocking splits across the content layer (Pelican templates, static files, plugins) and the hosting platform layer.

robots.txt via EXTRA_PATH_METADATA

Pelican copies files listed in STATIC_PATHS to the output directory. The cleanest approach is EXTRA_PATH_METADATA — it lets you place the file anywhere in your project and map it to a specific output path:

pelicanconf.py

# pelicanconf.py

# Tell Pelican to copy the 'extra/' directory
STATIC_PATHS = ['images', 'extra']

# Map extra/robots.txt → output/robots.txt
EXTRA_PATH_METADATA = {
    'extra/robots.txt': {'path': 'robots.txt'},
}

extra/robots.txt

User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

After pelican content, confirm output/robots.txt exists.

Alternative — theme static/ directory: You can also place robots.txt in your theme's static/ directory. Pelican copies the entire theme static/ to output/theme/ — but this puts it at output/theme/robots.txt, not the root. Use EXTRA_PATH_METADATA to ensure it lands at output/robots.txt.

Dynamic robots.txt via plugin

For environment-based robots.txt (strict in production, permissive in staging), write a small Pelican plugin using the finalized signal:

plugins/robots_generator.py

import os
from pelican import signals


AI_BOTS = """
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: PerplexityBot
Disallow: /
"""

PERMISSIVE = """
User-agent: GPTBot
Allow: /
"""


def generate_robots(pelican):
    is_prod = os.environ.get("PELICAN_ENV") == "production"
    siteurl = pelican.settings.get("SITEURL", "https://example.com")

    content = f"User-agent: *\nAllow: /\n"
    content += AI_BOTS if is_prod else PERMISSIVE
    content += f"\nSitemap: {siteurl}/sitemap.xml\n"

    output_path = pelican.settings.get("OUTPUT_PATH", "output")
    robots_path = os.path.join(output_path, "robots.txt")

    with open(robots_path, "w") as f:
        f.write(content)


def register():
    signals.finalized.connect(generate_robots)

pelicanconf.py — register plugin

# pelicanconf.py
PLUGIN_PATHS = ['plugins']
PLUGINS = ['robots_generator']

# Or using pelican-plugins package:
# PLUGINS = ['pelican.plugins.robots_generator']

Build commands

# Production — AI bots blocked
PELICAN_ENV=production pelican content

# Development — permissive
pelican content
finalized signal: Fires after all content has been generated and written to output. Using it for robots.txt ensures the output directory exists before you write. Earlier signals like initialized fire before the output directory is created.

noai meta tag in base.html

Pelican themes use Jinja2 templates. The base template (base.html) wraps all pages — add the noai meta tag there with a per-article/page fallback chain:

themes/[theme]/templates/base.html

<!DOCTYPE html>
<html lang="{{ DEFAULT_LANG }}">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>{% block title %}{{ SITENAME }}{% endblock %}</title>

  {# AI bot meta tag — per-article/page override via metadata field #}
  {% if article is defined and article.robots %}
    <meta name="robots" content="{{ article.robots }}">
  {% elif page is defined and page.robots %}
    <meta name="robots" content="{{ page.robots }}">
  {% else %}
    <meta name="robots" content="noai, noimageai">
  {% endif %}

  {% block head %}{% endblock %}
</head>
<body>
  {% block content %}{% endblock %}
</body>
</html>
Theme location: Edit your theme's templates/base.html. If using a third-party theme, copy it to a local directory first: cp -r $(pelican-themes -l | grep mytheme) ./themes/mytheme and set THEME = 'themes/mytheme' in pelicanconf.py. Never edit themes from the system-wide install path — they get overwritten on update.

Simpler approach — no per-article override

If you just need the same tag on every page:

<meta name="robots" content="noai, noimageai">

Using a Pelican setting for the default value

{# pelicanconf.py: ROBOTS_META = "noai, noimageai" #}
<meta name="robots" content="{{ ROBOTS_META | default('noai, noimageai') }}">

Per-article/page override

Pelican reads metadata from the header of each content file. Add a robots field to override the default:

RST article — default (no robots field)

My Article Title
================

:date: 2026-01-01
:category: Blog
:tags: python, pelican

Article content here. Default robots meta applies: "noai, noimageai".

RST article — allow indexing but no AI training

My Public Article
=================

:date: 2026-01-01
:robots: index, follow, noai, noimageai

Article content.

RST article — allow everything

Landing Page
============

:date: 2026-01-01
:robots: index, follow

Content.

Markdown article (requires markdown metadata extension)

Title: My Article
Date: 2026-01-01
robots: index, follow, noai, noimageai

Article content here.
Markdown metadata: Pelican's Markdown reader supports metadata headers in the file header block (before the first blank line). The field name is case-insensitive. Requires markdown in MARKDOWN['extensions'] in pelicanconf.py — it's enabled by default in most Pelican setups.

X-Robots-Tag via hosting platform

X-Robots-Tag is an HTTP response header — Pelican generates static files. Add it at the hosting layer.

Netlify — netlify.toml

[build]
  command = "pelican content -s pelicanconf.py"
  publish = "output"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Vercel — vercel.json

{
  "buildCommand": "pelican content -s pelicanconf.py",
  "outputDirectory": "output",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}

Cloudflare Pages — via EXTRA_PATH_METADATA

Cloudflare Pages reads a _headers file from the root of the published directory. Map it via EXTRA_PATH_METADATA:

pelicanconf.py

STATIC_PATHS = ['images', 'extra']
EXTRA_PATH_METADATA = {
    'extra/robots.txt': {'path': 'robots.txt'},
    'extra/_headers':   {'path': '_headers'},
}

extra/_headers

/*
  X-Robots-Tag: noai, noimageai

GitHub Pages

GitHub Pages does not support custom HTTP headers. The noai meta tag in base.html is your only option. For X-Robots-Tag or hard 403 blocking, migrate to Netlify, Vercel, or Cloudflare Pages.

Hard 403 via edge functions

Netlify Edge Function

Create netlify/edge-functions/block-ai-bots.ts:

import type { Context } from '@netlify/edge-functions';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export default async function handler(
  request: Request,
  context: Context
): Promise<Response> {
  const ua = request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
}

export const config = { path: '/*' };

Register in netlify.toml:

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

Cloudflare Pages Functions

Create functions/_middleware.ts at project root:

import type { PagesFunction } from '@cloudflare/workers-types';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
};

Deployment quick-reference

PlatformBuild commandPublish dirCustom headersEdge functions
Netlifypelican contentoutput✅ netlify.toml✅ netlify/edge-functions/
Vercelpelican contentoutput✅ vercel.json⚠️ Next.js required
Cloudflare Pagespelican contentoutput✅ extra/_headers via EXTRA_PATH_METADATA✅ functions/_middleware.ts
GitHub PagesCI: pelican contentoutput🚫 No🚫 No

Full pelicanconf.py

# pelicanconf.py
AUTHOR = 'Your Name'
SITENAME = 'My Site'
SITEURL = 'https://example.com'

PATH = 'content'
TIMEZONE = 'UTC'
DEFAULT_LANG = 'en'

# Theme
THEME = 'themes/mytheme'

# Static files — include extra/ for robots.txt and _headers
STATIC_PATHS = ['images', 'extra']
EXTRA_PATH_METADATA = {
    'extra/robots.txt': {'path': 'robots.txt'},
    'extra/_headers':   {'path': '_headers'},   # for Cloudflare Pages
}

# Plugins
PLUGIN_PATHS = ['plugins']
PLUGINS = []  # add 'robots_generator' for dynamic robots.txt

# Custom settings readable in templates
ROBOTS_META = 'noai, noimageai'

# Feed settings
FEED_ALL_ATOM = 'feeds/all.atom.xml'
CATEGORY_FEED_ATOM = 'feeds/{slug}.atom.xml'

# URL structure
ARTICLE_URL = '{category}/{slug}/'
ARTICLE_SAVE_AS = '{category}/{slug}/index.html'
PAGE_URL = '{slug}/'
PAGE_SAVE_AS = '{slug}/index.html'

FAQ

How do I add robots.txt to a Pelican site?

Use EXTRA_PATH_METADATA in pelicanconf.py: create extra/robots.txt, add 'extra' to STATIC_PATHS, and map "extra/robots.txt": {"path": "robots.txt"}. This copies it to output/robots.txt at the site root.

How do I add the noai meta tag to every Pelican page?

Edit themes/[theme]/templates/base.html. Add a Jinja2 conditional that reads article.robots or page.robots with a fallback of noai, noimageai. Copy the theme to a local directory first — never edit system-wide theme files.

How do I override robots on a specific Pelican article?

Add :robots: index, follow, noai, noimageai to the RST article header, or robots: ... to the Markdown metadata block. The template reads article.robots and falls back to the global default.

How do I add X-Robots-Tag to a Pelican site?

At the hosting layer. Netlify: netlify.toml. Vercel: vercel.json. Cloudflare Pages: extra/_headers mapped via EXTRA_PATH_METADATA.

Can I write a Pelican plugin to generate robots.txt dynamically?

Yes — connect to pelican.signals.finalized (fires after all output is written) and write robots.txt to the output path. Register with PLUGINS = ['robots_generator'] in pelicanconf.py.

Where is the base.html template in Pelican?

In your theme: themes/[theme-name]/templates/base.html. Copy the theme to a local path before editing. Set THEME = 'themes/mytheme' in pelicanconf.py. Hugo's lookup order doesn't apply here — you edit the theme file directly.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.