Skip to content
Rust · Zola SSG · Static Site Generator

How to Block AI Bots on Zola (Rust SSG)

Zola is a fast, opinionated static site generator written in Rust. It ships as a single binary with no dependencies — no Node.js, no Ruby, no Go toolchain. Zola uses Tera templates, TOML front matter, and outputs static HTML to public/. Because there is no runtime server in production, AI bot protection combines robots.txt, noai meta tags, host-level response headers, and Edge Functions at the hosting layer.

8 min readUpdated April 2026Zola 0.19+

1. robots.txt

Zola copies everything in the static/ directory to public/ during the build — unchanged, no processing. Place your robots.txt here and it will be served at the root of your deployed site.

Static robots.txt

Create static/robots.txt:

# Block all AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow legitimate search crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Zola's build copies this verbatim to public/robots.txt. No config.toml entry is needed — unlike some SSGs that require explicit copy directives, Zola's static/ directory is always copied in full.

Static vs templates: Zola does not process files in static/ through Tera templates. If you need a dynamic robots.txt that changes based on environment, you cannot use Zola's template engine for this file. Instead, use a build script that generates the file before zola build, or handle it at the hosting layer (e.g., Netlify's _redirects or Edge Functions).

Build script approach for environment-aware robots.txt

Since Zola cannot template files in static/, use a shell script that runs before the build:

#!/bin/bash
# build.sh — generate robots.txt then build
if [ "$DEPLOY_ENV" = "production" ]; then
  cat > static/robots.txt << 'EOF'
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: *
Allow: /
EOF
else
  cat > static/robots.txt << 'EOF'
# Staging — block all crawlers
User-agent: *
Disallow: /
EOF
fi

zola build

Set your build command to bash build.sh on your hosting platform and set DEPLOY_ENV=production as an environment variable.

2. noai meta tags in Tera templates

The noai and noimageai meta values signal to AI crawlers that the page content and images should not be used for training. Add them to your base template so every page is covered by default.

Base template

Zola uses Tera as its template engine. Edit templates/base.html:

<!DOCTYPE html>
<html lang="{{ lang }}">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>{% block title %}{{ config.title }}{% endblock %}</title>

  <!-- AI bot protection: default noai, allow override per-page -->
  {% if page.extra.robots %}
    <meta name="robots" content="{{ page.extra.robots }}">
  {% elif section.extra.robots %}
    <meta name="robots" content="{{ section.extra.robots }}">
  {% elif config.extra.default_robots %}
    <meta name="robots" content="{{ config.extra.default_robots }}">
  {% else %}
    <meta name="robots" content="noai, noimageai">
  {% endif %}

  {% block head %}{% endblock %}
</head>
<body>
  {% block content %}{% endblock %}
</body>
</html>
page vs section context: In Zola, page.html templates receive a page variable. section.html templates receive a section variable. The base template may be rendered in either context, so check both page.extra and section.extra. Using Tera's default() filter on a non-existent variable will error — use {% if %} guards instead.

Simplified with Tera default filter

If you only render the base template from page contexts (not section contexts), you can use a simpler one-liner:

<meta name="robots"
  content="{{ page.extra.robots | default(value=config.extra.default_robots | default(value='noai, noimageai')) }}">
Tera vs Jinja2: Tera's default() filter uses named parameter syntax: default(value="fallback"). This is different from Jinja2's default("fallback"). Positional arguments will cause a template error.

Per-page override via [extra] front matter

Zola uses TOML front matter delimited by +++. Custom fields must go in the [extra] table — they cannot be top-level:

+++
title = "About"
date = 2026-04-18

[extra]
robots = "index, follow"
+++

About page content here...

This overrides the template default for that page only. The meta tag will render content="index, follow" instead of content="noai, noimageai".

Common mistake: Putting robots = "index, follow" at the top level of the front matter (outside [extra]) will cause a Zola build error — Zola's front matter schema is strict, and unknown top-level keys are rejected. Always use [extra] for custom fields.

Section-level override

Sections in Zola use _index.md files. To allow all blog posts to be indexed, set the override in content/blog/_index.md:

+++
title = "Blog"
sort_by = "date"
paginate_by = 10

[extra]
robots = "index, follow"
+++

Access section-level extras in section.html via section.extra.robots. Individual pages within the section can still override with their own [extra] values.

3. Site-wide defaults via config.toml

Unlike Hugo's _default/baseof.html cascade or Jekyll's defaults: in _config.yml, Zola has no built-in front matter defaults system. The recommended pattern is to define defaults in config.toml under [extra] and reference them with fallbacks in your templates.

# config.toml
base_url = "https://yoursite.com"
title = "Your Site"
compile_sass = true
build_search_index = false
generate_feeds = true

[extra]
# Default robots value — used as fallback in base.html template
default_robots = "noai, noimageai"

The template from Section 2 checks page.extra.robots first, then falls back to config.extra.default_robots, then to the hardcoded noai, noimageai string. This three-tier cascade gives you:

4. X-Robots-Tag via host headers

Zola outputs static HTML — there is no application server adding HTTP headers in production. Set X-Robots-Tag at your hosting layer.

Netlify

In netlify.toml at the project root:

[build]
  command = "zola build"
  publish = "public"

[build.environment]
  ZOLA_VERSION = "0.19.2"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"
    X-Content-Type-Options = "nosniff"
    X-Frame-Options = "SAMEORIGIN"
Netlify + Zola: Netlify has first-class Zola support. Set ZOLA_VERSION in [build.environment] to pin the version. The publish directory is public (Zola's default output directory).

Vercel

In vercel.json at the project root:

{
  "buildCommand": "zola build",
  "outputDirectory": "public",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}
Vercel Zola support: Vercel does not auto-detect Zola. You must specify the build command in vercel.json or project settings, and ensure the Zola binary is available in your build environment (install via a build script or use a Docker build).

Cloudflare Pages

Create static/_headers. Zola copies all files in static/ to public/, placing this at public/_headers where Cloudflare Pages reads it:

/*
  X-Robots-Tag: noai, noimageai
Cloudflare Pages + Zola: Cloudflare Pages has native Zola support. Set the build command to zola build and output directory to public in your Pages project settings. The _headers file in static/ is automatically placed at the output root.

GitHub Pages

GitHub Pages does not support custom HTTP headers. Use the noai meta tag approach (Section 2) for GitHub Pages deployments. For header-level control, use Netlify, Vercel, or Cloudflare Pages.

5. Hard 403 via Edge Functions

A hard 403 blocks the AI bot before it reads any content — more effective than signals that a crawler can choose to ignore. Requires server-side execution at the edge.

Netlify Edge Function

Create netlify/edge-functions/bot-block.ts:

import type { Config, Context } from "@netlify/edge-functions";

const AI_BOTS = [
  "GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
  "CCBot", "Google-Extended", "PerplexityBot",
  "Applebot-Extended", "Amazonbot", "meta-externalagent",
  "Bytespider", "DuckAssistBot", "YouBot",
];

export default async function handler(req: Request, _ctx: Context) {
  const ua = req.headers.get("user-agent") ?? "";
  const isBot = AI_BOTS.some((bot) => ua.includes(bot));

  if (isBot) {
    return new Response("Forbidden", {
      status: 403,
      headers: { "content-type": "text/plain" },
    });
  }
}

export const config: Config = {
  path: "/*",
};

Register in netlify.toml:

[build]
  command = "zola build"
  publish = "public"

[build.environment]
  ZOLA_VERSION = "0.19.2"

[[edge_functions]]
  path = "/*"
  function = "bot-block"

Vercel middleware

Create middleware.ts at the project root (same level as vercel.json, not inside public/ or templates/):

import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";

const AI_BOTS = [
  "GPTBot", "ClaudeBot", "Claude-Web", "anthropic-ai",
  "CCBot", "Google-Extended", "PerplexityBot",
  "Applebot-Extended", "Amazonbot", "meta-externalagent",
  "Bytespider",
];

export function middleware(request: NextRequest) {
  const ua = request.headers.get("user-agent") ?? "";
  const isBot = AI_BOTS.some((bot) => ua.includes(bot));

  if (isBot) {
    return new NextResponse("Forbidden", { status: 403 });
  }
  return NextResponse.next();
}

export const config = {
  matcher: ["/((?!_next/static|favicon.ico).*)"],
};

Cloudflare Pages middleware

Create functions/_middleware.ts at the project root (the functions/ directory is separate from Zola's source and is not processed by the build):

// functions/_middleware.ts
const AI_BOTS = [
  "GPTBot", "ClaudeBot", "CCBot", "Google-Extended",
  "PerplexityBot", "Applebot-Extended", "Amazonbot",
  "meta-externalagent", "Bytespider",
];

export async function onRequest(context: EventContext<any, any, any>) {
  const ua = context.request.headers.get("user-agent") ?? "";
  const isBot = AI_BOTS.some((bot) => ua.includes(bot));

  if (isBot) {
    return new Response("Forbidden", { status: 403 });
  }
  return context.next();
}

6. Full config.toml example

A complete Zola config.toml with AI bot protection defaults and standard settings:

# config.toml
base_url = "https://yoursite.com"
title = "Your Site"
description = "Your site description"

# Build settings
compile_sass = true
build_search_index = false
generate_feeds = true
feed_filenames = ["atom.xml"]

# Minification
minify_html = true

# Taxonomies (optional — common setup)
taxonomies = [
  { name = "tags", feed = true },
  { name = "categories" },
]

[markdown]
highlight_code = true
highlight_theme = "css"

[extra]
# AI bot protection default — referenced in templates/base.html
default_robots = "noai, noimageai"

# Your site-specific extras
author = "Your Name"
twitter = "@yourhandle"

Pair this with the base template from Section 2. The template checks page.extra.robotssection.extra.robots config.extra.default_robots → hardcoded fallback, giving you granular control at every level.

Zola project structure with AI protection

yoursite/
├── config.toml              # [extra] default_robots
├── content/
│   ├── _index.md            # Homepage
│   └── blog/
│       ├── _index.md        # [extra] robots = "index, follow"
│       └── first-post.md    # Inherits from section or overrides
├── static/
│   ├── robots.txt           # Copied to public/robots.txt
│   └── _headers             # For Cloudflare Pages
├── templates/
│   ├── base.html            # noai meta tag with cascade
│   ├── index.html           # Homepage template
│   ├── page.html            # Content pages
│   └── section.html         # Section listings
├── netlify.toml             # X-Robots-Tag + Edge Function config
└── netlify/
    └── edge-functions/
        └── bot-block.ts     # Hard 403 for AI bots

7. Deployment comparison

Zola's build command is zola build and its output directory is public. Here is how each host handles AI bot protection:

Hostrobots.txtnoai metaX-Robots-TagHard 403
Netlifystatic/robots.txt → public/ ✓Tera template ✓netlify.toml [[headers]] ✓Edge Function ✓
Vercelstatic/robots.txt → public/ ✓Tera template ✓vercel.json headers ✓middleware.ts ✓
Cloudflare Pagesstatic/robots.txt → public/ ✓Tera template ✓static/_headers → public/ ✓functions/_middleware.ts ✓
GitHub Pagesstatic/robots.txt → public/ ✓Tera template ✓Not supported ✗Not supported ✗
Fly.iostatic/robots.txt → public/ ✓Tera template ✓Dockerfile static server ✓Custom server ✓

For full protection — robots.txt + meta tags + X-Robots-Tag + hard 403 — deploy to Netlify (best native support), Cloudflare Pages, or Vercel. GitHub Pages lacks header-level and edge-level bot blocking.

FAQ

How do I add robots.txt to a Zola site?

Place robots.txt in your static/ directory. Zola copies everything in static/ to the public/ output directory during build — no configuration required. The file will be available at yoursite.com/robots.txt automatically.

How do I add noai meta tags to Zola templates?

In your Tera base template (templates/base.html), use a conditional chain:

{% if page.extra.robots %}
  <meta name="robots" content="{{ page.extra.robots }}">
{% elif config.extra.default_robots %}
  <meta name="robots" content="{{ config.extra.default_robots }}">
{% else %}
  <meta name="robots" content="noai, noimageai">
{% endif %}

The page.extra object contains fields from the [extra] section in TOML front matter. Use config.extra for site-wide defaults defined in config.toml.

How do I set a site-wide robots default?

Add to config.toml:

[extra]
default_robots = "noai, noimageai"

Access it in templates via config.extra.default_robots. Unlike Jekyll or Hugo, Zola has no built-in front matter defaults cascade — the config.extra approach combined with Tera template fallbacks is the standard pattern.

What is the [extra] section in Zola front matter?

Zola's front matter is strict TOML between +++ delimiters. Standard fields (title, date, description, taxonomies) go at the top level. Custom fields like robots must go inside the [extra] table — putting them at the top level will cause a build error. Access them in templates as page.extra.field_name.

How is Zola different from Hugo for AI bot blocking?

Key differences:

Will blocking AI bots affect my SEO?

Blocking AI-specific crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended) does not affect standard search engine indexing. Googlebot and Bingbot are separate user agents from Google-Extended and are not blocked by the configurations in this guide. Always include explicit Allow rules for Googlebot and Bingbot in your robots.txt to make your intent unambiguous.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.