How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Pelican · Python · Static Site Generator8 min read

How to Block AI Bots on Pelican: Complete 2026 Guide

Pelican is the most popular Python static site generator — used for developer blogs, documentation, and content-heavy sites. It generates a static output directory that can be deployed anywhere. Bot blocking splits across the content layer (Pelican templates, static files, plugins) and the hosting platform layer.

robots.txt via EXTRA_PATH_METADATA
Dynamic robots.txt via plugin
noai meta tag in base.html
Per-article/page override
X-Robots-Tag via hosting platform
Hard 403 via edge functions
Deployment quick-reference
FAQ

robots.txt via EXTRA_PATH_METADATA

Pelican copies files listed in STATIC_PATHS to the output directory. The cleanest approach is EXTRA_PATH_METADATA — it lets you place the file anywhere in your project and map it to a specific output path:

pelicanconf.py

# pelicanconf.py

# Tell Pelican to copy the 'extra/' directory
STATIC_PATHS = ['images', 'extra']

# Map extra/robots.txt → output/robots.txt
EXTRA_PATH_METADATA = {
    'extra/robots.txt': {'path': 'robots.txt'},
}

extra/robots.txt

User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

After pelican content, confirm output/robots.txt exists.

Alternative — theme static/ directory: You can also place robots.txt in your theme's static/ directory. Pelican copies the entire theme static/ to output/theme/ — but this puts it at output/theme/robots.txt, not the root. Use EXTRA_PATH_METADATA to ensure it lands at output/robots.txt.

Dynamic robots.txt via plugin

For environment-based robots.txt (strict in production, permissive in staging), write a small Pelican plugin using the finalized signal:

plugins/robots_generator.py

import os
from pelican import signals


AI_BOTS = """
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: PerplexityBot
Disallow: /
"""

PERMISSIVE = """
User-agent: GPTBot
Allow: /
"""


def generate_robots(pelican):
    is_prod = os.environ.get("PELICAN_ENV") == "production"
    siteurl = pelican.settings.get("SITEURL", "https://example.com")

    content = f"User-agent: *\nAllow: /\n"
    content += AI_BOTS if is_prod else PERMISSIVE
    content += f"\nSitemap: {siteurl}/sitemap.xml\n"

    output_path = pelican.settings.get("OUTPUT_PATH", "output")
    robots_path = os.path.join(output_path, "robots.txt")

    with open(robots_path, "w") as f:
        f.write(content)


def register():
    signals.finalized.connect(generate_robots)

pelicanconf.py — register plugin

# pelicanconf.py
PLUGIN_PATHS = ['plugins']
PLUGINS = ['robots_generator']

# Or using pelican-plugins package:
# PLUGINS = ['pelican.plugins.robots_generator']

Build commands

# Production — AI bots blocked
PELICAN_ENV=production pelican content

# Development — permissive
pelican content

finalized signal: Fires after all content has been generated and written to output. Using it for robots.txt ensures the output directory exists before you write. Earlier signals like initialized fire before the output directory is created.

noai meta tag in base.html

Pelican themes use Jinja2 templates. The base template (base.html) wraps all pages — add the noai meta tag there with a per-article/page fallback chain:

themes/[theme]/templates/base.html

<!DOCTYPE html>
<html lang="{{ DEFAULT_LANG }}">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>{% block title %}{{ SITENAME }}{% endblock %}</title>

  {# AI bot meta tag — per-article/page override via metadata field #}
  {% if article is defined and article.robots %}
    <meta name="robots" content="{{ article.robots }}">
  {% elif page is defined and page.robots %}
    <meta name="robots" content="{{ page.robots }}">
  {% else %}
    <meta name="robots" content="noai, noimageai">
  {% endif %}

  {% block head %}{% endblock %}
</head>
<body>
  {% block content %}{% endblock %}
</body>
</html>

Theme location: Edit your theme's templates/base.html. If using a third-party theme, copy it to a local directory first: cp -r $(pelican-themes -l | grep mytheme) ./themes/mytheme and set THEME = 'themes/mytheme' in pelicanconf.py. Never edit themes from the system-wide install path — they get overwritten on update.

Simpler approach — no per-article override

If you just need the same tag on every page:

<meta name="robots" content="noai, noimageai">

Using a Pelican setting for the default value

{# pelicanconf.py: ROBOTS_META = "noai, noimageai" #}
<meta name="robots" content="{{ ROBOTS_META | default('noai, noimageai') }}">

Per-article/page override

Pelican reads metadata from the header of each content file. Add a robots field to override the default:

RST article — default (no robots field)

My Article Title
================

:date: 2026-01-01
:category: Blog
:tags: python, pelican

Article content here. Default robots meta applies: "noai, noimageai".

RST article — allow indexing but no AI training

My Public Article
=================

:date: 2026-01-01
:robots: index, follow, noai, noimageai

Article content.

RST article — allow everything

Landing Page
============

:date: 2026-01-01
:robots: index, follow

Content.

Markdown article (requires markdown metadata extension)

Title: My Article
Date: 2026-01-01
robots: index, follow, noai, noimageai

Article content here.

Markdown metadata: Pelican's Markdown reader supports metadata headers in the file header block (before the first blank line). The field name is case-insensitive. Requires markdown in MARKDOWN['extensions'] in pelicanconf.py — it's enabled by default in most Pelican setups.

X-Robots-Tag via hosting platform

X-Robots-Tag is an HTTP response header — Pelican generates static files. Add it at the hosting layer.

Netlify — netlify.toml

[build]
  command = "pelican content -s pelicanconf.py"
  publish = "output"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Vercel — vercel.json

{
  "buildCommand": "pelican content -s pelicanconf.py",
  "outputDirectory": "output",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}

Cloudflare Pages — via EXTRA_PATH_METADATA

Cloudflare Pages reads a _headers file from the root of the published directory. Map it via EXTRA_PATH_METADATA:

pelicanconf.py

STATIC_PATHS = ['images', 'extra']
EXTRA_PATH_METADATA = {
    'extra/robots.txt': {'path': 'robots.txt'},
    'extra/_headers':   {'path': '_headers'},
}

extra/_headers

/*
  X-Robots-Tag: noai, noimageai

GitHub Pages

GitHub Pages does not support custom HTTP headers. The noai meta tag in base.html is your only option. For X-Robots-Tag or hard 403 blocking, migrate to Netlify, Vercel, or Cloudflare Pages.

Hard 403 via edge functions

Netlify Edge Function

Create netlify/edge-functions/block-ai-bots.ts:

import type { Context } from '@netlify/edge-functions';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export default async function handler(
  request: Request,
  context: Context
): Promise<Response> {
  const ua = request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
}

export const config = { path: '/*' };

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

Cloudflare Pages Functions

Create functions/_middleware.ts at project root:

import type { PagesFunction } from '@cloudflare/workers-types';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
};

Deployment quick-reference

Platform	Build command	Publish dir	Custom headers	Edge functions
Netlify	`pelican content`	`output`	✅ netlify.toml	✅ netlify/edge-functions/
Vercel	`pelican content`	`output`	✅ vercel.json	⚠️ Next.js required
Cloudflare Pages	`pelican content`	`output`	✅ extra/_headers via EXTRA_PATH_METADATA	✅ functions/_middleware.ts
GitHub Pages	CI: `pelican content`	`output`	🚫 No	🚫 No

Full pelicanconf.py

# pelicanconf.py
AUTHOR = 'Your Name'
SITENAME = 'My Site'
SITEURL = 'https://example.com'

PATH = 'content'
TIMEZONE = 'UTC'
DEFAULT_LANG = 'en'

# Theme
THEME = 'themes/mytheme'

# Static files — include extra/ for robots.txt and _headers
STATIC_PATHS = ['images', 'extra']
EXTRA_PATH_METADATA = {
    'extra/robots.txt': {'path': 'robots.txt'},
    'extra/_headers':   {'path': '_headers'},   # for Cloudflare Pages
}

# Plugins
PLUGIN_PATHS = ['plugins']
PLUGINS = []  # add 'robots_generator' for dynamic robots.txt

# Custom settings readable in templates
ROBOTS_META = 'noai, noimageai'

# Feed settings
FEED_ALL_ATOM = 'feeds/all.atom.xml'
CATEGORY_FEED_ATOM = 'feeds/{slug}.atom.xml'

# URL structure
ARTICLE_URL = '{category}/{slug}/'
ARTICLE_SAVE_AS = '{category}/{slug}/index.html'
PAGE_URL = '{slug}/'
PAGE_SAVE_AS = '{slug}/index.html'

FAQ

How do I add robots.txt to a Pelican site?

Use EXTRA_PATH_METADATA in pelicanconf.py: create extra/robots.txt, add 'extra' to STATIC_PATHS, and map "extra/robots.txt": {"path": "robots.txt"}. This copies it to output/robots.txt at the site root.

How do I add the noai meta tag to every Pelican page?

Edit themes/[theme]/templates/base.html. Add a Jinja2 conditional that reads article.robots or page.robots with a fallback of noai, noimageai. Copy the theme to a local directory first — never edit system-wide theme files.

How do I override robots on a specific Pelican article?

Add :robots: index, follow, noai, noimageai to the RST article header, or robots: ... to the Markdown metadata block. The template reads article.robots and falls back to the global default.

How do I add X-Robots-Tag to a Pelican site?

At the hosting layer. Netlify: netlify.toml. Vercel: vercel.json. Cloudflare Pages: extra/_headers mapped via EXTRA_PATH_METADATA.

Can I write a Pelican plugin to generate robots.txt dynamically?

Yes — connect to pelican.signals.finalized (fires after all output is written) and write robots.txt to the output path. Register with PLUGINS = ['robots_generator'] in pelicanconf.py.

Where is the base.html template in Pelican?

In your theme: themes/[theme-name]/templates/base.html. Copy the theme to a local path before editing. Set THEME = 'themes/mytheme' in pelicanconf.py. Hugo's lookup order doesn't apply here — you edit the theme file directly.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

How to Block AI Bots on Pelican: Complete 2026 Guide

Contents

robots.txt via EXTRA_PATH_METADATA

pelicanconf.py

extra/robots.txt

Dynamic robots.txt via plugin

plugins/robots_generator.py

pelicanconf.py — register plugin

Build commands

noai meta tag in base.html

themes/[theme]/templates/base.html

Simpler approach — no per-article override

Using a Pelican setting for the default value

Per-article/page override

RST article — default (no robots field)

RST article — allow indexing but no AI training

RST article — allow everything

Markdown article (requires markdown metadata extension)

X-Robots-Tag via hosting platform

Netlify — netlify.toml

Vercel — vercel.json

Cloudflare Pages — via EXTRA_PATH_METADATA

GitHub Pages

Hard 403 via edge functions

Netlify Edge Function

Cloudflare Pages Functions

Deployment quick-reference

Full pelicanconf.py

FAQ

How do I add robots.txt to a Pelican site?

How do I add the noai meta tag to every Pelican page?

How do I override robots on a specific Pelican article?

How do I add X-Robots-Tag to a Pelican site?

Can I write a Pelican plugin to generate robots.txt dynamically?

Where is the base.html template in Pelican?