Skip to content
Guides/Lume
Lume · Deno SSG · Static Site Generator9 min read

How to Block AI Bots on Lume: Complete 2026 Guide

Lume is a fast, flexible static site generator built on Deno. It supports Nunjucks, Markdown, JSX, TypeScript, and more — and outputs plain static HTML to _site/. Because Lume has no server process in production, AI bot protection uses a combination of robots.txt, noai meta tags in layouts, host-level header config, and Edge Functions for hard blocking.

robots.txt — static copy & dynamic route

Option 1: static copy (simplest)

Place robots.txt in your source directory root and tell Lume to copy it as-is via site.copy() in _config.ts:

// _config.ts
import lume from "lume/mod.ts";

const site = lume();

site.copy("robots.txt");   // copies src/robots.txt → _site/robots.txt

export default site;
# robots.txt (in your source root)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: *
Allow: /
site.copy() path: The argument is relative to your source directory (default: ./ or the src value in lume()options). If your source is src/, place the file at src/robots.txt and call site.copy("robots.txt"). Lume copies it verbatim to _site/robots.txt — no template processing.

Option 2: dynamic robots.txt page

For environment-based content (different rules for staging vs production), create a Lume page that generates the file:

// src/robots.txt.ts
export const url = "/robots.txt";

const isProduction = Deno.env.get("LUME_ENV") !== "staging";

const aiBlockRules = `User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /
`;

const stagingRules = `User-agent: *
Disallow: /
`;

export default function () {
  if (!isProduction) {
    // On staging: block all crawlers including Google
    return stagingRules;
  }

  return `${aiBlockRules}
User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml`;
}
Static copy takes precedence: If you have both a robots.txt file copied via site.copy() and a src/robots.txt.ts page, Lume's page wins — it overwrites the copied file in _site/. Use one approach, not both.

noai meta tag in layouts

Lume supports multiple template engines. Add the robots meta tag to your base layout in _includes/:

Nunjucks layout (_includes/layout.njk)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>{{ title }}</title>

  {# Block AI training by default; override per-page with robots: index, follow #}
  <meta name="robots" content="{{ robots | default('noai, noimageai') }}">
</head>
<body>
  {{ content | safe }}
</body>
</html>

JSX/TSX layout (_includes/Layout.tsx)

interface Props {
  title: string;
  robots?: string;
  children: unknown;
}

export default ({ title, robots = "noai, noimageai", children }: Props) => (
  <html lang="en">
    <head>
      <meta charSet="UTF-8" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      <title>{title}</title>
      <meta name="robots" content={robots} />
    </head>
    <body>{children}</body>
  </html>
);

Liquid layout (_includes/layout.liquid)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>{{ title }}</title>
  <meta name="robots" content="{{ robots | default: 'noai, noimageai' }}">
</head>
<body>
  {{ content }}
</body>
</html>

Per-page override (front matter)

---
title: About This Site
layout: layout.njk
robots: index, follow, max-image-preview:large
---

Page content here.
Lume data cascade: Front matter values flow through the _data.yml / _data.ts cascade. Set a global default for all pages in _data.yml at the root level: robots: "noai, noimageai". Per-page front matter overrides it. This avoids updating every layout file individually.

Global default via _data.yml

# _data.yml (applies to all pages in the directory and subdirectories)
robots: "noai, noimageai"
layout: layout.njk

X-Robots-Tag via host headers config

Lume outputs a static site — there is no server process in production. Response headers must be configured at the hosting layer.

Netlify (netlify.toml)

# netlify.toml
[build]
  publish = "_site"
  command = "deno task build"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"
    X-Content-Type-Options = "nosniff"
    X-Frame-Options = "SAMEORIGIN"

Vercel (vercel.json)

{
  "outputDirectory": "_site",
  "buildCommand": "deno task build",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "X-Robots-Tag", "value": "noai, noimageai" },
        { "key": "X-Content-Type-Options", "value": "nosniff" }
      ]
    }
  ]
}

Cloudflare Pages (_headers file)

# _headers (place in your source root, copy via site.copy("_headers"))
/*
  X-Robots-Tag: noai, noimageai
  X-Content-Type-Options: nosniff
  X-Frame-Options: SAMEORIGIN
// _config.ts — copy _headers to _site/
site.copy("_headers");
Cloudflare Pages _headers: Cloudflare Pages reads the _headers file from your publish directory (_site/). Since Lume won't automatically copy files starting with _, you must explicitly include site.copy("_headers") in _config.ts.

Hard 403 via Edge Functions

Robots.txt and meta tags are advisory — determined bots ignore them. Edge Functions enforce hard 403 responses before any HTML is served.

Netlify Edge Function

// netlify/edge-functions/bot-block.ts
import type { Context } from "https://edge.netlify.com/";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

export default async (request: Request, context: Context): Promise<Response> => {
  const ua = request.headers.get("user-agent") ?? "";

  if (BOT_PATTERN.test(ua)) {
    return new Response("Forbidden", {
      status: 403,
      headers: { "Content-Type": "text/plain" },
    });
  }

  return context.next();
};

export const config = { path: "/*" };

Vercel middleware (project root)

// middleware.ts (place in project root — NOT in _site/)
import { NextRequest, NextResponse } from "next/server";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

export function middleware(req: NextRequest) {
  const ua = req.headers.get("user-agent") ?? "";
  if (BOT_PATTERN.test(ua)) {
    return new NextResponse("Forbidden", { status: 403 });
  }
  return NextResponse.next();
}

export const config = { matcher: ["/((?!_next/static|_next/image|favicon.ico).*)"] };
Vercel middleware.ts is not inside _site/: The middleware lives in your project root alongside _config.ts and vercel.json. Lume's build output in _site/ is the static site — the middleware is a Vercel-specific file that Vercel picks up from the project root. Never put middleware.ts inside _site/.

Cloudflare Pages (_middleware.ts)

// functions/_middleware.ts
import type { PagesFunction } from "@cloudflare/workers-types";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get("user-agent") ?? "";

  if (BOT_PATTERN.test(ua)) {
    return new Response("Forbidden", {
      status: 403,
      headers: { "Content-Type": "text/plain" },
    });
  }

  return context.next();
};
functions/ is outside _site/: Cloudflare Pages Functions live in a functions/ directory at the project root — not inside your publish directory. The _middleware.ts file in functions/ intercepts all requests before the static files are served.

Deno Deploy bot-blocking entrypoint

Lume has a first-class integration with Deno Deploy. When using a custom server entrypoint, you can implement bot blocking before serving static files:

// server.ts (Deno Deploy entrypoint)
import { serveDir } from "jsr:@std/http/file-server";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

Deno.serve(async (req: Request): Promise<Response> => {
  const ua = req.headers.get("user-agent") ?? "";

  // Block AI bots before serving any file
  if (BOT_PATTERN.test(ua)) {
    return new Response("Forbidden", {
      status: 403,
      headers: {
        "Content-Type": "text/plain",
        "X-Robots-Tag": "noai, noimageai",
      },
    });
  }

  // Inject X-Robots-Tag on all served responses
  const res = await serveDir(req, {
    fsRoot: "_site",
    urlRoot: "",
    quiet: true,
  });

  const headers = new Headers(res.headers);
  headers.set("X-Robots-Tag", "noai, noimageai");

  return new Response(res.body, {
    status: res.status,
    statusText: res.statusText,
    headers,
  });
});
# deno.json — deployment tasks
{
  "tasks": {
    "build": "deno run -A https://deno.land/x/lume/ci.ts",
    "serve": "deno run -A https://deno.land/x/lume/cli.ts --serve",
    "deploy": "deployctl deploy --project=my-lume-site server.ts"
  }
}
serveDir response immutability: serveDir() returns a Response object whose headers you cannot modify in place. Create a new Response with new Headers(res.headers), set your header, then wrap the original res.body in a new Response. The body stream passes through without buffering.

Deploy to Deno Deploy with GitHub Actions

# .github/workflows/deploy.yml
name: Deploy to Deno Deploy
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: denoland/setup-deno@v2
        with:
          deno-version: v2.x
      - name: Build Lume site
        run: deno task build
      - name: Deploy to Deno Deploy
        uses: denoland/deployctl@v1
        with:
          project: my-lume-site
          entrypoint: server.ts

Full _config.ts example

// _config.ts
import lume from "lume/mod.ts";
import nunjucks from "lume/plugins/nunjucks.ts";
import markdown from "lume/plugins/markdown.ts";
import jsx from "lume/plugins/jsx.ts";
import sitemap from "lume/plugins/sitemap.ts";
import minifyHTML from "lume/plugins/minify_html.ts";

const site = lume({
  src: "./src",
  dest: "./_site",
  location: new URL("https://yourdomain.com"),
});

// Plugins
site.use(nunjucks());
site.use(markdown());
site.use(jsx());
site.use(sitemap());
site.use(minifyHTML());

// Copy static files as-is to _site/
site.copy("robots.txt");
site.copy("_headers");    // Cloudflare Pages headers
site.copy("favicon.ico");
site.copy("static", ".");  // src/static/ → _site/ (merge at root)

// Global data available in all templates
site.data("robots", "noai, noimageai");   // default robots value
site.data("site", {
  title: "My Lume Site",
  url: "https://yourdomain.com",
});

export default site;
site.data() for global defaults: site.data("robots", "noai, noimageai") sets a global data value available in all templates as {{ robots }} (Nunjucks) or page.data.robots (JSX). Per-page front matter overrides it. Alternatively, use _data.yml in the source root — both approaches work, but site.data() is more visible and IDE-friendly.

Deployment comparison

Platformrobots.txtX-Robots-TagHard 403Notes
Deno DeployserveDir auto-servesserver.ts response headersserver.ts UA checkNative Lume target; full control
NetlifyCopied to _site/netlify.toml [[headers]]Edge Functionnetlify.toml publish = "_site"
VercelCopied to _site/vercel.json headers()middleware.ts (project root)outputDirectory: "_site"
Cloudflare PagesCopied to _site/_headers file (copy it)functions/_middleware.ts_headers needs site.copy("_headers")
GitHub PagesCopied to _site/Not supportedNot supportedNo custom headers; noai meta only
Firebase HostingCopied to _site/firebase.json headersCloud Functions rewritepublic: "_site" in firebase.json

FAQ

How do I add robots.txt to a Lume site?

Two options: (1) Static copy — place robots.txt in your source directory and add site.copy("robots.txt") in _config.ts. Lume copies it verbatim to _site/. (2) Dynamic route — create src/robots.txt.ts that exports url: "/robots.txt" and a default function returning the file content as a string. If both exist, the page route wins.

How do I add noai meta tags to a Lume layout?

Add <meta name="robots" content="{{ robots | default('noai, noimageai') }}"> to your base layout in _includes/. Override per-page with robots: index, follow in front matter. For a global default without touching every layout, use site.data("robots", "noai, noimageai") in _config.ts.

How do I add X-Robots-Tag to a Lume static site?

Lume has no server process in production — headers are set at the hosting layer. Netlify: [[headers]] in netlify.toml. Vercel: headers() in vercel.json. Cloudflare Pages: _headers file copied via site.copy("_headers"). Deno Deploy: set headers in your server.ts entrypoint.

How do I hard-block AI bots with a 403 on a Lume site?

Hard 403 needs server-side execution. Netlify: Edge Function in netlify/edge-functions/. Vercel: middleware.ts in the project root (not inside _site/). Cloudflare Pages: functions/_middleware.ts. Deno Deploy: check User-Agent in server.ts before calling serveDir().

What is the output directory in Lume?

_site/ by default. Change with lume({ dest: "dist" }) in _config.ts. Point your hosting platform's publish directory to whichever value you set — Netlify: publish = "_site", Vercel: outputDirectory: "_site".

How does bot blocking work on Deno Deploy with a Lume site?

Use a custom server.ts entrypoint that checks req.headers.get("user-agent") before calling serveDir(req, { fsRoot: "_site" }). Matching bots return new Response("Forbidden", { status: 403 }) immediately. Legitimate requests pass through to the static file server.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.