How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Hexo · Node.js · Static Site9 min read

How to Block AI Bots on Hexo: Complete 2026 Guide

Hexo generates a static site — no server process at runtime. Bot blocking splits across the content layer (Hexo source files and themes) and the hosting platform layer. This guide covers every technique: robots.txt, noai meta tags, X-Robots-Tag headers, and hard 403 blocking via edge functions.

robots.txt in source/
Generator plugin for dynamic robots.txt
noai meta tag in theme layouts
Per-page override via front matter
X-Robots-Tag via hosting platform
Hard 403 via edge functions
Deployment quick-reference
FAQ

robots.txt in source/

Hexo copies everything in source/ to public/ on hexo generate. Place robots.txt in source/robots.txt and it will appear at the root of your deployed site — no plugin or configuration required.

Gotcha — existing hexo-generator-robots: If you have hexo-generator-robots installed, it generates robots.txt automatically from _config.yml. Your manual source/robots.txt will conflict with it. Either uninstall the plugin or remove your manual file and configure the plugin directly in _config.yml.

source/robots.txt

User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

After hexo generate, verify the file exists at public/robots.txt.

Generator plugin for dynamic robots.txt

For environment-based robots.txt (strict blocking in prod, permissive in staging), write a Hexo generator plugin. Create a scripts/ directory in your Hexo root — Hexo auto-loads all .js files in scripts/.

scripts/robots-generator.js

'use strict';

hexo.extend.generator.register('robots', function (locals) {
  const isProd = process.env.NODE_ENV === 'production';

  const aiBlock = [
    'User-agent: GPTBot',
    'Disallow: /',
    '',
    'User-agent: ClaudeBot',
    'Disallow: /',
    '',
    'User-agent: anthropic-ai',
    'Disallow: /',
    '',
    'User-agent: CCBot',
    'Disallow: /',
    '',
    'User-agent: Google-Extended',
    'Disallow: /',
  ].join('\n');

  const permissive = [
    'User-agent: GPTBot',
    'Allow: /',
  ].join('\n');

  const content = [
    'User-agent: *',
    'Allow: /',
    '',
    isProd ? aiBlock : permissive,
    '',
    `Sitemap: ${hexo.config.url}/sitemap.xml`,
  ].join('\n');

  return {
    path: 'robots.txt',
    data: content,
  };
});

Delete source/robots.txt if you use a generator — Hexo will error if two generators produce the same output path. The scripts/ file takes priority over a static plugin but conflicts with a manual static file.

Run with environment variable

# Production build — AI bots blocked
NODE_ENV=production npx hexo generate

# Staging build — permissive
npx hexo generate

noai meta tag in theme layouts

To add the noai meta tag to every page, edit your theme's head partial. Hexo themes use EJS, Nunjucks, Pug, or Swig — the approach is the same in each.

Which file to edit? Look for themes/[theme]/layout/_partial/head.ejs (or head.njk, head.pug). If your theme uses a single layout file, look for layout.ejs in the theme's layout/ directory.

EJS theme (themes/[theme]/layout/_partial/head.ejs)

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title><%= page.title ? page.title + ' | ' : '' %><%= config.title %></title>

  <!-- AI bot blocking -->
  <meta name="robots" content="<%= page.robots || 'noai, noimageai' %>">

  <%- partial('head/open_graph') %>
  <%- css('css/style') %>
</head>

Nunjucks theme (themes/[theme]/layout/_partial/head.njk)

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>{{ page.title + ' | ' if page.title }}{{ config.title }}</title>

  <!-- AI bot blocking -->
  <meta name="robots" content="{{ page.robots or 'noai, noimageai' }}">

  {{ partial('head/open_graph') }}
  {{ css('css/style') }}
</head>

Pug theme (themes/[theme]/layout/_partial/head.pug)

head
  meta(charset='UTF-8')
  meta(name='viewport', content='width=device-width, initial-scale=1.0')
  title= (page.title ? page.title + ' | ' : '') + config.title

  //- AI bot blocking
  meta(name='robots', content=page.robots || 'noai, noimageai')

  != partial('head/open_graph')
  != css('css/style')

The page.robots || 'noai, noimageai' pattern sets a default of noai, noimageai for all pages, with per-page override via front matter (see next section).

Per-page override via front matter

Once your theme reads page.robots, you can override the robots value for any individual post or page via its front matter:

Block AI on all pages (default via theme)

---
title: My Post
date: 2026-01-01
# No robots field — theme default applies: "noai, noimageai"
---

Override on a specific page — allow indexing but no training

---
title: My Public Post
date: 2026-01-01
robots: index, follow, noai, noimageai
---

Override on a specific page — allow everything (e.g. landing page)

---
title: Landing Page
date: 2026-01-01
robots: index, follow
---

Override on a specific page — block everything

---
title: Private Content
date: 2026-01-01
robots: noindex, noai, noimageai
---

Hexo default theme (landscape): The default landscape theme does not have a page.robots variable. You must add it yourself to the theme's head.ejs partial — or switch to a community theme that supports it.

X-Robots-Tag via hosting platform

X-Robots-Tag is an HTTP response header. Hexo generates static files — there's no server to inject headers at runtime. Add the header at the hosting layer.

Netlify — netlify.toml

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Vercel — vercel.json

{
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}

Place vercel.json in your Hexo project root (not in source/— it's a deployment config file, not a static asset).

Cloudflare Pages — source/_headers

Cloudflare Pages reads a _headers file from the root of your published directory. Since Hexo copies source/ to public/, place the file in source/_headers:

/*
  X-Robots-Tag: noai, noimageai

After hexo generate, confirm public/_headers exists before deploying.

GitHub Pages

GitHub Pages does not support custom HTTP headers. X-Robots-Tag is not possible. The noai meta tag (in your theme) is your only option. For hard blocking, migrate to Netlify, Vercel, or Cloudflare Pages.

Hard 403 via edge functions

For hard UA-based blocking (returning 403 before serving any content), use an edge function at your hosting provider.

Netlify Edge Function

Create netlify/edge-functions/block-ai-bots.ts in your Hexo project root:

import type { Context } from '@netlify/edge-functions';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export default async function handler(
  request: Request,
  context: Context
): Promise<Response> {
  const ua = request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
}

export const config = { path: '/*' };

[build]
  command = "npx hexo generate"
  publish = "public"

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

Cloudflare Pages Functions

Create functions/_middleware.ts in your Hexo project root (Cloudflare Pages Functions live outside source/):

import type { PagesFunction } from '@cloudflare/workers-types';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
};

Cloudflare Pages build config: Set the build command to npx hexo generate and the output directory to public in the Cloudflare Pages dashboard. The functions/ directory is picked up automatically.

Vercel Edge Middleware

Create middleware.ts in the Hexo project root:

import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const AI_BOTS = [
  'gptbot', 'claudebot', 'anthropic-ai', 'ccbot',
  'google-extended', 'ahrefsbot', 'bytespider',
  'amazonbot', 'diffbot', 'facebookbot', 'cohere-ai',
  'perplexitybot', 'youbot',
];

export function middleware(request: NextRequest) {
  const ua = (request.headers.get('user-agent') || '').toLowerCase();
  if (AI_BOTS.some((bot) => ua.includes(bot))) {
    return new NextResponse('Forbidden', { status: 403 });
  }
  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};

Vercel + static Hexo: Vercel Edge Middleware requires a Next.js or framework-aware deployment. For a plain static Hexo export, prefer Netlify Edge Functions or Cloudflare Pages Functions — they work natively with static sites.

Deployment quick-reference

Platform	Build command	Publish dir	Custom headers	Edge functions
Netlify	`npx hexo generate`	`public`	✅ netlify.toml	✅ netlify/edge-functions/
Vercel	`npx hexo generate`	`public`	✅ vercel.json	⚠️ Next.js required
Cloudflare Pages	`npx hexo generate`	`public`	✅ source/_headers	✅ functions/_middleware.ts
GitHub Pages	`npx hexo generate --deploy`	`public`	🚫 not supported	🚫 not supported
Firebase Hosting	`npx hexo generate`	`public`	✅ firebase.json headers	✅ Cloud Functions
AWS S3 + CloudFront	`npx hexo generate`	`public`	✅ CloudFront response policy	✅ Lambda@Edge

Full hexo.config.yml excerpt

# _config.yml
url: https://example.com
root: /

# If using hexo-generator-feed (RSS) — note this doesn't affect robots
feed:
  type: atom
  path: atom.xml

# If using hexo-generator-sitemap
sitemap:
  path: sitemap.xml
  tag: false
  category: false

package.json scripts

{
  "scripts": {
    "build": "hexo generate",
    "build:prod": "NODE_ENV=production hexo generate",
    "clean": "hexo clean",
    "server": "hexo server",
    "deploy": "hexo deploy"
  }
}

FAQ

Does Hexo have built-in robots.txt support?

Yes. Any file placed in source/ is copied to public/ on build — including robots.txt. No plugin or configuration required. Create source/robots.txt and run hexo generate.

How do I add the noai meta tag to every Hexo page?

Edit your theme's layout file (themes/[theme]/layout/_partial/head.ejs or head.njk). Add <meta name="robots" content="noai, noimageai"> inside the <head> block. For per-page override, use front matter (robots: index, noai) and a conditional in the template (<%= page.robots || 'noai, noimageai' %>).

Can I block AI bots with a Hexo plugin?

Yes — write a generator plugin (hexo.extend.generator.register) to programmatically produce robots.txt at build time. Place it in the scripts/ directory in your Hexo root — Hexo auto-loads all .js files in scripts/.

How do I add X-Robots-Tag on Hexo sites?

X-Robots-Tag is an HTTP response header — Hexo generates static files, not a server. Add it at the hosting layer: Netlify [[headers]] in netlify.toml; Vercel headers() in vercel.json; Cloudflare Pages _headers file in source/ (copied to public/).

Does GitHub Pages support X-Robots-Tag for Hexo sites?

No. GitHub Pages does not allow custom HTTP headers. For X-Robots-Tag or hard 403 blocking, migrate to Netlify, Vercel, or Cloudflare Pages — all support custom headers and edge functions.

How do I block AI bots on a Hexo site hosted on Cloudflare Pages?

Two options: (1) Create source/_headers with X-Robots-Tag: noai, noimageai for all paths — Hexo copies it to public/_headers on build. (2) Add a functions/_middleware.ts Cloudflare Pages Function that checks the User-Agent header and returns a 403 for known AI crawlers.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

How to Block AI Bots on Hexo: Complete 2026 Guide

Contents

robots.txt in source/

source/robots.txt

Generator plugin for dynamic robots.txt

scripts/robots-generator.js

Run with environment variable

noai meta tag in theme layouts

EJS theme (themes/[theme]/layout/_partial/head.ejs)

Nunjucks theme (themes/[theme]/layout/_partial/head.njk)

Pug theme (themes/[theme]/layout/_partial/head.pug)

Per-page override via front matter

Block AI on all pages (default via theme)

Override on a specific page — allow indexing but no training

Override on a specific page — allow everything (e.g. landing page)

Override on a specific page — block everything

X-Robots-Tag via hosting platform

Netlify — netlify.toml

Vercel — vercel.json

Cloudflare Pages — source/_headers

GitHub Pages

Hard 403 via edge functions

Netlify Edge Function

Cloudflare Pages Functions

Vercel Edge Middleware

Deployment quick-reference

Full hexo.config.yml excerpt

package.json scripts

FAQ

Does Hexo have built-in robots.txt support?

How do I add the noai meta tag to every Hexo page?

Can I block AI bots with a Hexo plugin?

How do I add X-Robots-Tag on Hexo sites?

Does GitHub Pages support X-Robots-Tag for Hexo sites?

How do I block AI bots on a Hexo site hosted on Cloudflare Pages?