Skip to content
Guides/Hexo
Hexo · Node.js · Static Site9 min read

How to Block AI Bots on Hexo: Complete 2026 Guide

Hexo generates a static site — no server process at runtime. Bot blocking splits across the content layer (Hexo source files and themes) and the hosting platform layer. This guide covers every technique: robots.txt, noai meta tags, X-Robots-Tag headers, and hard 403 blocking via edge functions.

robots.txt in source/

Hexo copies everything in source/ to public/ on hexo generate. Place robots.txt in source/robots.txt and it will appear at the root of your deployed site — no plugin or configuration required.

Gotcha — existing hexo-generator-robots: If you have hexo-generator-robots installed, it generates robots.txt automatically from _config.yml. Your manual source/robots.txt will conflict with it. Either uninstall the plugin or remove your manual file and configure the plugin directly in _config.yml.

source/robots.txt

User-agent: *
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

After hexo generate, verify the file exists at public/robots.txt.

Generator plugin for dynamic robots.txt

For environment-based robots.txt (strict blocking in prod, permissive in staging), write a Hexo generator plugin. Create a scripts/ directory in your Hexo root — Hexo auto-loads all .js files in scripts/.

scripts/robots-generator.js

'use strict';

hexo.extend.generator.register('robots', function (locals) {
  const isProd = process.env.NODE_ENV === 'production';

  const aiBlock = [
    'User-agent: GPTBot',
    'Disallow: /',
    '',
    'User-agent: ClaudeBot',
    'Disallow: /',
    '',
    'User-agent: anthropic-ai',
    'Disallow: /',
    '',
    'User-agent: CCBot',
    'Disallow: /',
    '',
    'User-agent: Google-Extended',
    'Disallow: /',
  ].join('\n');

  const permissive = [
    'User-agent: GPTBot',
    'Allow: /',
  ].join('\n');

  const content = [
    'User-agent: *',
    'Allow: /',
    '',
    isProd ? aiBlock : permissive,
    '',
    `Sitemap: ${hexo.config.url}/sitemap.xml`,
  ].join('\n');

  return {
    path: 'robots.txt',
    data: content,
  };
});
Delete source/robots.txt if you use a generator — Hexo will error if two generators produce the same output path. The scripts/ file takes priority over a static plugin but conflicts with a manual static file.

Run with environment variable

# Production build — AI bots blocked
NODE_ENV=production npx hexo generate

# Staging build — permissive
npx hexo generate

noai meta tag in theme layouts

To add the noai meta tag to every page, edit your theme's head partial. Hexo themes use EJS, Nunjucks, Pug, or Swig — the approach is the same in each.

Which file to edit? Look for themes/[theme]/layout/_partial/head.ejs (or head.njk, head.pug). If your theme uses a single layout file, look for layout.ejs in the theme's layout/ directory.

EJS theme (themes/[theme]/layout/_partial/head.ejs)

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title><%= page.title ? page.title + ' | ' : '' %><%= config.title %></title>

  <!-- AI bot blocking -->
  <meta name="robots" content="<%= page.robots || 'noai, noimageai' %>">

  <%- partial('head/open_graph') %>
  <%- css('css/style') %>
</head>

Nunjucks theme (themes/[theme]/layout/_partial/head.njk)

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>{{ page.title + ' | ' if page.title }}{{ config.title }}</title>

  <!-- AI bot blocking -->
  <meta name="robots" content="{{ page.robots or 'noai, noimageai' }}">

  {{ partial('head/open_graph') }}
  {{ css('css/style') }}
</head>

Pug theme (themes/[theme]/layout/_partial/head.pug)

head
  meta(charset='UTF-8')
  meta(name='viewport', content='width=device-width, initial-scale=1.0')
  title= (page.title ? page.title + ' | ' : '') + config.title

  //- AI bot blocking
  meta(name='robots', content=page.robots || 'noai, noimageai')

  != partial('head/open_graph')
  != css('css/style')

The page.robots || 'noai, noimageai' pattern sets a default of noai, noimageai for all pages, with per-page override via front matter (see next section).

Per-page override via front matter

Once your theme reads page.robots, you can override the robots value for any individual post or page via its front matter:

Block AI on all pages (default via theme)

---
title: My Post
date: 2026-01-01
# No robots field — theme default applies: "noai, noimageai"
---

Override on a specific page — allow indexing but no training

---
title: My Public Post
date: 2026-01-01
robots: index, follow, noai, noimageai
---

Override on a specific page — allow everything (e.g. landing page)

---
title: Landing Page
date: 2026-01-01
robots: index, follow
---

Override on a specific page — block everything

---
title: Private Content
date: 2026-01-01
robots: noindex, noai, noimageai
---
Hexo default theme (landscape): The default landscape theme does not have a page.robots variable. You must add it yourself to the theme's head.ejs partial — or switch to a community theme that supports it.

X-Robots-Tag via hosting platform

X-Robots-Tag is an HTTP response header. Hexo generates static files — there's no server to inject headers at runtime. Add the header at the hosting layer.

Netlify — netlify.toml

[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Vercel — vercel.json

{
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        {
          "key": "X-Robots-Tag",
          "value": "noai, noimageai"
        }
      ]
    }
  ]
}

Place vercel.json in your Hexo project root (not in source/— it's a deployment config file, not a static asset).

Cloudflare Pages — source/_headers

Cloudflare Pages reads a _headers file from the root of your published directory. Since Hexo copies source/ to public/, place the file in source/_headers:

/*
  X-Robots-Tag: noai, noimageai

After hexo generate, confirm public/_headers exists before deploying.

GitHub Pages

GitHub Pages does not support custom HTTP headers. X-Robots-Tag is not possible. The noai meta tag (in your theme) is your only option. For hard blocking, migrate to Netlify, Vercel, or Cloudflare Pages.

Hard 403 via edge functions

For hard UA-based blocking (returning 403 before serving any content), use an edge function at your hosting provider.

Netlify Edge Function

Create netlify/edge-functions/block-ai-bots.ts in your Hexo project root:

import type { Context } from '@netlify/edge-functions';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export default async function handler(
  request: Request,
  context: Context
): Promise<Response> {
  const ua = request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
}

export const config = { path: '/*' };

Register it in netlify.toml:

[build]
  command = "npx hexo generate"
  publish = "public"

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

Cloudflare Pages Functions

Create functions/_middleware.ts in your Hexo project root (Cloudflare Pages Functions live outside source/):

import type { PagesFunction } from '@cloudflare/workers-types';

const AI_BOTS = [
  'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
  'Google-Extended', 'AhrefsBot', 'Bytespider',
  'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
  'PerplexityBot', 'YouBot',
];

export const onRequest: PagesFunction = async (context) => {
  const ua = context.request.headers.get('user-agent') || '';
  const isBot = AI_BOTS.some((bot) =>
    ua.toLowerCase().includes(bot.toLowerCase())
  );
  if (isBot) {
    return new Response('Forbidden', { status: 403 });
  }
  return context.next();
};
Cloudflare Pages build config: Set the build command to npx hexo generate and the output directory to public in the Cloudflare Pages dashboard. The functions/ directory is picked up automatically.

Vercel Edge Middleware

Create middleware.ts in the Hexo project root:

import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const AI_BOTS = [
  'gptbot', 'claudebot', 'anthropic-ai', 'ccbot',
  'google-extended', 'ahrefsbot', 'bytespider',
  'amazonbot', 'diffbot', 'facebookbot', 'cohere-ai',
  'perplexitybot', 'youbot',
];

export function middleware(request: NextRequest) {
  const ua = (request.headers.get('user-agent') || '').toLowerCase();
  if (AI_BOTS.some((bot) => ua.includes(bot))) {
    return new NextResponse('Forbidden', { status: 403 });
  }
  return NextResponse.next();
}

export const config = {
  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};
Vercel + static Hexo: Vercel Edge Middleware requires a Next.js or framework-aware deployment. For a plain static Hexo export, prefer Netlify Edge Functions or Cloudflare Pages Functions — they work natively with static sites.

Deployment quick-reference

PlatformBuild commandPublish dirCustom headersEdge functions
Netlifynpx hexo generatepublic✅ netlify.toml✅ netlify/edge-functions/
Vercelnpx hexo generatepublic✅ vercel.json⚠️ Next.js required
Cloudflare Pagesnpx hexo generatepublic✅ source/_headers✅ functions/_middleware.ts
GitHub Pagesnpx hexo generate --deploypublic🚫 not supported🚫 not supported
Firebase Hostingnpx hexo generatepublic✅ firebase.json headers✅ Cloud Functions
AWS S3 + CloudFrontnpx hexo generatepublic✅ CloudFront response policy✅ Lambda@Edge

Full hexo.config.yml excerpt

# _config.yml
url: https://example.com
root: /

# If using hexo-generator-feed (RSS) — note this doesn't affect robots
feed:
  type: atom
  path: atom.xml

# If using hexo-generator-sitemap
sitemap:
  path: sitemap.xml
  tag: false
  category: false

package.json scripts

{
  "scripts": {
    "build": "hexo generate",
    "build:prod": "NODE_ENV=production hexo generate",
    "clean": "hexo clean",
    "server": "hexo server",
    "deploy": "hexo deploy"
  }
}

FAQ

Does Hexo have built-in robots.txt support?

Yes. Any file placed in source/ is copied to public/ on build — including robots.txt. No plugin or configuration required. Create source/robots.txt and run hexo generate.

How do I add the noai meta tag to every Hexo page?

Edit your theme's layout file (themes/[theme]/layout/_partial/head.ejs or head.njk). Add <meta name="robots" content="noai, noimageai"> inside the <head> block. For per-page override, use front matter (robots: index, noai) and a conditional in the template (<%= page.robots || 'noai, noimageai' %>).

Can I block AI bots with a Hexo plugin?

Yes — write a generator plugin (hexo.extend.generator.register) to programmatically produce robots.txt at build time. Place it in the scripts/ directory in your Hexo root — Hexo auto-loads all .js files in scripts/.

How do I add X-Robots-Tag on Hexo sites?

X-Robots-Tag is an HTTP response header — Hexo generates static files, not a server. Add it at the hosting layer: Netlify [[headers]] in netlify.toml; Vercel headers() in vercel.json; Cloudflare Pages _headers file in source/ (copied to public/).

Does GitHub Pages support X-Robots-Tag for Hexo sites?

No. GitHub Pages does not allow custom HTTP headers. For X-Robots-Tag or hard 403 blocking, migrate to Netlify, Vercel, or Cloudflare Pages — all support custom headers and edge functions.

How do I block AI bots on a Hexo site hosted on Cloudflare Pages?

Two options: (1) Create source/_headers with X-Robots-Tag: noai, noimageai for all paths — Hexo copies it to public/_headers on build. (2) Add a functions/_middleware.ts Cloudflare Pages Function that checks the User-Agent header and returns a 403 for known AI crawlers.