How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Lume · Deno SSG · Static Site Generator9 min read

How to Block AI Bots on Lume: Complete 2026 Guide

Lume is a fast, flexible static site generator built on Deno. It supports Nunjucks, Markdown, JSX, TypeScript, and more — and outputs plain static HTML to _site/. Because Lume has no server process in production, AI bot protection uses a combination of robots.txt, noai meta tags in layouts, host-level header config, and Edge Functions for hard blocking.

robots.txt (static copy & dynamic route)
noai meta tag in layouts
X-Robots-Tag via host headers config
Hard 403 via Edge Functions
Deno Deploy bot-blocking entrypoint
Full _config.ts example
Deployment comparison
FAQ

robots.txt — static copy & dynamic route

Option 1: static copy (simplest)

Place robots.txt in your source directory root and tell Lume to copy it as-is via site.copy() in _config.ts:

// _config.ts
import lume from "lume/mod.ts";

const site = lume();

site.copy("robots.txt");   // copies src/robots.txt → _site/robots.txt

export default site;

# robots.txt (in your source root)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: *
Allow: /

site.copy() path: The argument is relative to your source directory (default: ./ or the src value in lume()options). If your source is src/, place the file at src/robots.txt and call site.copy("robots.txt"). Lume copies it verbatim to _site/robots.txt — no template processing.

Option 2: dynamic robots.txt page

For environment-based content (different rules for staging vs production), create a Lume page that generates the file:

// src/robots.txt.ts
export const url = "/robots.txt";

const isProduction = Deno.env.get("LUME_ENV") !== "staging";

const aiBlockRules = `User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /
`;

const stagingRules = `User-agent: *
Disallow: /
`;

export default function () {
  if (!isProduction) {
    // On staging: block all crawlers including Google
    return stagingRules;
  }

  return `${aiBlockRules}
User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml`;
}

Static copy takes precedence: If you have both a robots.txt file copied via site.copy() and a src/robots.txt.ts page, Lume's page wins — it overwrites the copied file in _site/. Use one approach, not both.

noai meta tag in layouts

Lume supports multiple template engines. Add the robots meta tag to your base layout in _includes/:

Nunjucks layout (_includes/layout.njk)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>{{ title }}</title>

  {# Block AI training by default; override per-page with robots: index, follow #}
  <meta name="robots" content="{{ robots | default('noai, noimageai') }}">
</head>
<body>
  {{ content | safe }}
</body>
</html>

JSX/TSX layout (_includes/Layout.tsx)

interface Props {
  title: string;
  robots?: string;
  children: unknown;
}

export default ({ title, robots = "noai, noimageai", children }: Props) => (
  <html lang="en">
    <head>
      <meta charSet="UTF-8" />
      <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      <title>{title}</title>
      <meta name="robots" content={robots} />
    </head>
    <body>{children}</body>
  </html>
);

Liquid layout (_includes/layout.liquid)

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>{{ title }}</title>
  <meta name="robots" content="{{ robots | default: 'noai, noimageai' }}">
</head>
<body>
  {{ content }}
</body>
</html>

Per-page override (front matter)

---
title: About This Site
layout: layout.njk
robots: index, follow, max-image-preview:large
---

Page content here.

Lume data cascade: Front matter values flow through the _data.yml / _data.ts cascade. Set a global default for all pages in _data.yml at the root level: robots: "noai, noimageai". Per-page front matter overrides it. This avoids updating every layout file individually.

Global default via _data.yml

# _data.yml (applies to all pages in the directory and subdirectories)
robots: "noai, noimageai"
layout: layout.njk

X-Robots-Tag via host headers config

Lume outputs a static site — there is no server process in production. Response headers must be configured at the hosting layer.

Netlify (netlify.toml)

# netlify.toml
[build]
  publish = "_site"
  command = "deno task build"

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"
    X-Content-Type-Options = "nosniff"
    X-Frame-Options = "SAMEORIGIN"

Vercel (vercel.json)

{
  "outputDirectory": "_site",
  "buildCommand": "deno task build",
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "X-Robots-Tag", "value": "noai, noimageai" },
        { "key": "X-Content-Type-Options", "value": "nosniff" }
      ]
    }
  ]
}

Cloudflare Pages (_headers file)

# _headers (place in your source root, copy via site.copy("_headers"))
/*
  X-Robots-Tag: noai, noimageai
  X-Content-Type-Options: nosniff
  X-Frame-Options: SAMEORIGIN

// _config.ts — copy _headers to _site/
site.copy("_headers");

Cloudflare Pages _headers: Cloudflare Pages reads the _headers file from your publish directory (_site/). Since Lume won't automatically copy files starting with _, you must explicitly include site.copy("_headers") in _config.ts.

Hard 403 via Edge Functions

Robots.txt and meta tags are advisory — determined bots ignore them. Edge Functions enforce hard 403 responses before any HTML is served.

Netlify Edge Function

// netlify/edge-functions/bot-block.ts
import type { Context } from "https://edge.netlify.com/";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

export default async (request: Request, context: Context): Promise<Response> => {
  const ua = request.headers.get("user-agent") ?? "";

  if (BOT_PATTERN.test(ua)) {
    return new Response("Forbidden", {
      status: 403,
      headers: { "Content-Type": "text/plain" },
    });
  }

  return context.next();
};

export const config = { path: "/*" };

Vercel middleware (project root)

// middleware.ts (place in project root — NOT in _site/)
import { NextRequest, NextResponse } from "next/server";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

export function middleware(req: NextRequest) {
  const ua = req.headers.get("user-agent") ?? "";
  if (BOT_PATTERN.test(ua)) {
    return new NextResponse("Forbidden", { status: 403 });
  }
  return NextResponse.next();
}

export const config = { matcher: ["/((?!_next/static|_next/image|favicon.ico).*)"] };

Vercel middleware.ts is not inside _site/: The middleware lives in your project root alongside _config.ts and vercel.json. Lume's build output in _site/ is the static site — the middleware is a Vercel-specific file that Vercel picks up from the project root. Never put middleware.ts inside _site/.

Cloudflare Pages (_middleware.ts)

// functions/_middleware.ts
import type { PagesFunction } from "@cloudflare/workers-types";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get("user-agent") ?? "";

  if (BOT_PATTERN.test(ua)) {
    return new Response("Forbidden", {
      status: 403,
      headers: { "Content-Type": "text/plain" },
    });
  }

  return context.next();
};

functions/ is outside _site/: Cloudflare Pages Functions live in a functions/ directory at the project root — not inside your publish directory. The _middleware.ts file in functions/ intercepts all requests before the static files are served.

Deno Deploy bot-blocking entrypoint

Lume has a first-class integration with Deno Deploy. When using a custom server entrypoint, you can implement bot blocking before serving static files:

// server.ts (Deno Deploy entrypoint)
import { serveDir } from "jsr:@std/http/file-server";

const BOT_PATTERN =
  /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|AhrefsBot|Bytespider|Amazonbot|Diffbot|FacebookBot|cohere-ai|PerplexityBot|YouBot/i;

Deno.serve(async (req: Request): Promise<Response> => {
  const ua = req.headers.get("user-agent") ?? "";

  // Block AI bots before serving any file
  if (BOT_PATTERN.test(ua)) {
    return new Response("Forbidden", {
      status: 403,
      headers: {
        "Content-Type": "text/plain",
        "X-Robots-Tag": "noai, noimageai",
      },
    });
  }

  // Inject X-Robots-Tag on all served responses
  const res = await serveDir(req, {
    fsRoot: "_site",
    urlRoot: "",
    quiet: true,
  });

  const headers = new Headers(res.headers);
  headers.set("X-Robots-Tag", "noai, noimageai");

  return new Response(res.body, {
    status: res.status,
    statusText: res.statusText,
    headers,
  });
});

# deno.json — deployment tasks
{
  "tasks": {
    "build": "deno run -A https://deno.land/x/lume/ci.ts",
    "serve": "deno run -A https://deno.land/x/lume/cli.ts --serve",
    "deploy": "deployctl deploy --project=my-lume-site server.ts"
  }
}

serveDir response immutability: serveDir() returns a Response object whose headers you cannot modify in place. Create a new Response with new Headers(res.headers), set your header, then wrap the original res.body in a new Response. The body stream passes through without buffering.

Deploy to Deno Deploy with GitHub Actions

# .github/workflows/deploy.yml
name: Deploy to Deno Deploy
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: denoland/setup-deno@v2
        with:
          deno-version: v2.x
      - name: Build Lume site
        run: deno task build
      - name: Deploy to Deno Deploy
        uses: denoland/deployctl@v1
        with:
          project: my-lume-site
          entrypoint: server.ts

Full _config.ts example

// _config.ts
import lume from "lume/mod.ts";
import nunjucks from "lume/plugins/nunjucks.ts";
import markdown from "lume/plugins/markdown.ts";
import jsx from "lume/plugins/jsx.ts";
import sitemap from "lume/plugins/sitemap.ts";
import minifyHTML from "lume/plugins/minify_html.ts";

const site = lume({
  src: "./src",
  dest: "./_site",
  location: new URL("https://yourdomain.com"),
});

// Plugins
site.use(nunjucks());
site.use(markdown());
site.use(jsx());
site.use(sitemap());
site.use(minifyHTML());

// Copy static files as-is to _site/
site.copy("robots.txt");
site.copy("_headers");    // Cloudflare Pages headers
site.copy("favicon.ico");
site.copy("static", ".");  // src/static/ → _site/ (merge at root)

// Global data available in all templates
site.data("robots", "noai, noimageai");   // default robots value
site.data("site", {
  title: "My Lume Site",
  url: "https://yourdomain.com",
});

export default site;

site.data() for global defaults: site.data("robots", "noai, noimageai") sets a global data value available in all templates as {{ robots }} (Nunjucks) or page.data.robots (JSX). Per-page front matter overrides it. Alternatively, use _data.yml in the source root — both approaches work, but site.data() is more visible and IDE-friendly.

Deployment comparison

Platform	robots.txt	X-Robots-Tag	Hard 403	Notes
Deno Deploy	serveDir auto-serves	server.ts response headers	server.ts UA check	Native Lume target; full control
Netlify	Copied to _site/	netlify.toml [[headers]]	Edge Function	netlify.toml publish = "_site"
Vercel	Copied to _site/	vercel.json headers()	middleware.ts (project root)	outputDirectory: "_site"
Cloudflare Pages	Copied to _site/	_headers file (copy it)	functions/_middleware.ts	_headers needs site.copy("_headers")
GitHub Pages	Copied to _site/	Not supported	Not supported	No custom headers; noai meta only
Firebase Hosting	Copied to _site/	firebase.json headers	Cloud Functions rewrite	public: "_site" in firebase.json

FAQ

How do I add robots.txt to a Lume site?

Two options: (1) Static copy — place robots.txt in your source directory and add site.copy("robots.txt") in _config.ts. Lume copies it verbatim to _site/. (2) Dynamic route — create src/robots.txt.ts that exports url: "/robots.txt" and a default function returning the file content as a string. If both exist, the page route wins.

How do I add noai meta tags to a Lume layout?

Add <meta name="robots" content="{{ robots | default('noai, noimageai') }}"> to your base layout in _includes/. Override per-page with robots: index, follow in front matter. For a global default without touching every layout, use site.data("robots", "noai, noimageai") in _config.ts.

How do I add X-Robots-Tag to a Lume static site?

Lume has no server process in production — headers are set at the hosting layer. Netlify: [[headers]] in netlify.toml. Vercel: headers() in vercel.json. Cloudflare Pages: _headers file copied via site.copy("_headers"). Deno Deploy: set headers in your server.ts entrypoint.

How do I hard-block AI bots with a 403 on a Lume site?

Hard 403 needs server-side execution. Netlify: Edge Function in netlify/edge-functions/. Vercel: middleware.ts in the project root (not inside _site/). Cloudflare Pages: functions/_middleware.ts. Deno Deploy: check User-Agent in server.ts before calling serveDir().

What is the output directory in Lume?

_site/ by default. Change with lume({ dest: "dist" }) in _config.ts. Point your hosting platform's publish directory to whichever value you set — Netlify: publish = "_site", Vercel: outputDirectory: "_site".

How does bot blocking work on Deno Deploy with a Lume site?

Use a custom server.ts entrypoint that checks req.headers.get("user-agent") before calling serveDir(req, { fsRoot: "_site" }). Matching bots return new Response("Forbidden", { status: 403 }) immediately. Legitimate requests pass through to the static file server.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

How to Block AI Bots on Lume: Complete 2026 Guide

Contents

robots.txt — static copy & dynamic route

Option 1: static copy (simplest)

Option 2: dynamic robots.txt page

noai meta tag in layouts

Nunjucks layout (_includes/layout.njk)

JSX/TSX layout (_includes/Layout.tsx)

Liquid layout (_includes/layout.liquid)

Per-page override (front matter)

Global default via _data.yml

X-Robots-Tag via host headers config

Netlify (netlify.toml)

Vercel (vercel.json)

Cloudflare Pages (_headers file)

Hard 403 via Edge Functions

Netlify Edge Function

Vercel middleware (project root)

Cloudflare Pages (_middleware.ts)

Deno Deploy bot-blocking entrypoint

Deploy to Deno Deploy with GitHub Actions

Full _config.ts example

Deployment comparison

FAQ

How do I add robots.txt to a Lume site?

How do I add noai meta tags to a Lume layout?

How do I add X-Robots-Tag to a Lume static site?

How do I hard-block AI bots with a 403 on a Lume site?

What is the output directory in Lume?

How does bot blocking work on Deno Deploy with a Lume site?