How to Block AI Bots on Hexo: Complete 2026 Guide
Hexo generates a static site — no server process at runtime. Bot blocking splits across the content layer (Hexo source files and themes) and the hosting platform layer. This guide covers every technique: robots.txt, noai meta tags, X-Robots-Tag headers, and hard 403 blocking via edge functions.
Contents
robots.txt in source/
Hexo copies everything in source/ to public/ on hexo generate. Place robots.txt in source/robots.txt and it will appear at the root of your deployed site — no plugin or configuration required.
hexo-generator-robots installed, it generates robots.txt automatically from _config.yml. Your manual source/robots.txt will conflict with it. Either uninstall the plugin or remove your manual file and configure the plugin directly in _config.yml.source/robots.txt
User-agent: *
Allow: /
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: YouBot
Disallow: /
Sitemap: https://example.com/sitemap.xmlAfter hexo generate, verify the file exists at public/robots.txt.
Generator plugin for dynamic robots.txt
For environment-based robots.txt (strict blocking in prod, permissive in staging), write a Hexo generator plugin. Create a scripts/ directory in your Hexo root — Hexo auto-loads all .js files in scripts/.
scripts/robots-generator.js
'use strict';
hexo.extend.generator.register('robots', function (locals) {
const isProd = process.env.NODE_ENV === 'production';
const aiBlock = [
'User-agent: GPTBot',
'Disallow: /',
'',
'User-agent: ClaudeBot',
'Disallow: /',
'',
'User-agent: anthropic-ai',
'Disallow: /',
'',
'User-agent: CCBot',
'Disallow: /',
'',
'User-agent: Google-Extended',
'Disallow: /',
].join('\n');
const permissive = [
'User-agent: GPTBot',
'Allow: /',
].join('\n');
const content = [
'User-agent: *',
'Allow: /',
'',
isProd ? aiBlock : permissive,
'',
`Sitemap: ${hexo.config.url}/sitemap.xml`,
].join('\n');
return {
path: 'robots.txt',
data: content,
};
});scripts/ file takes priority over a static plugin but conflicts with a manual static file.Run with environment variable
# Production build — AI bots blocked
NODE_ENV=production npx hexo generate
# Staging build — permissive
npx hexo generatenoai meta tag in theme layouts
To add the noai meta tag to every page, edit your theme's head partial. Hexo themes use EJS, Nunjucks, Pug, or Swig — the approach is the same in each.
themes/[theme]/layout/_partial/head.ejs (or head.njk, head.pug). If your theme uses a single layout file, look for layout.ejs in the theme's layout/ directory.EJS theme (themes/[theme]/layout/_partial/head.ejs)
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title><%= page.title ? page.title + ' | ' : '' %><%= config.title %></title>
<!-- AI bot blocking -->
<meta name="robots" content="<%= page.robots || 'noai, noimageai' %>">
<%- partial('head/open_graph') %>
<%- css('css/style') %>
</head>Nunjucks theme (themes/[theme]/layout/_partial/head.njk)
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ page.title + ' | ' if page.title }}{{ config.title }}</title>
<!-- AI bot blocking -->
<meta name="robots" content="{{ page.robots or 'noai, noimageai' }}">
{{ partial('head/open_graph') }}
{{ css('css/style') }}
</head>Pug theme (themes/[theme]/layout/_partial/head.pug)
head
meta(charset='UTF-8')
meta(name='viewport', content='width=device-width, initial-scale=1.0')
title= (page.title ? page.title + ' | ' : '') + config.title
//- AI bot blocking
meta(name='robots', content=page.robots || 'noai, noimageai')
!= partial('head/open_graph')
!= css('css/style')The page.robots || 'noai, noimageai' pattern sets a default of noai, noimageai for all pages, with per-page override via front matter (see next section).
Per-page override via front matter
Once your theme reads page.robots, you can override the robots value for any individual post or page via its front matter:
Block AI on all pages (default via theme)
---
title: My Post
date: 2026-01-01
# No robots field — theme default applies: "noai, noimageai"
---Override on a specific page — allow indexing but no training
---
title: My Public Post
date: 2026-01-01
robots: index, follow, noai, noimageai
---Override on a specific page — allow everything (e.g. landing page)
---
title: Landing Page
date: 2026-01-01
robots: index, follow
---Override on a specific page — block everything
---
title: Private Content
date: 2026-01-01
robots: noindex, noai, noimageai
---page.robots variable. You must add it yourself to the theme's head.ejs partial — or switch to a community theme that supports it.X-Robots-Tag via hosting platform
X-Robots-Tag is an HTTP response header. Hexo generates static files — there's no server to inject headers at runtime. Add the header at the hosting layer.
Netlify — netlify.toml
[[headers]]
for = "/*"
[headers.values]
X-Robots-Tag = "noai, noimageai"Vercel — vercel.json
{
"headers": [
{
"source": "/(.*)",
"headers": [
{
"key": "X-Robots-Tag",
"value": "noai, noimageai"
}
]
}
]
}Place vercel.json in your Hexo project root (not in source/— it's a deployment config file, not a static asset).
Cloudflare Pages — source/_headers
Cloudflare Pages reads a _headers file from the root of your published directory. Since Hexo copies source/ to public/, place the file in source/_headers:
/*
X-Robots-Tag: noai, noimageaiAfter hexo generate, confirm public/_headers exists before deploying.
GitHub Pages
X-Robots-Tag is not possible. The noai meta tag (in your theme) is your only option. For hard blocking, migrate to Netlify, Vercel, or Cloudflare Pages.Hard 403 via edge functions
For hard UA-based blocking (returning 403 before serving any content), use an edge function at your hosting provider.
Netlify Edge Function
Create netlify/edge-functions/block-ai-bots.ts in your Hexo project root:
import type { Context } from '@netlify/edge-functions';
const AI_BOTS = [
'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
'Google-Extended', 'AhrefsBot', 'Bytespider',
'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
'PerplexityBot', 'YouBot',
];
export default async function handler(
request: Request,
context: Context
): Promise<Response> {
const ua = request.headers.get('user-agent') || '';
const isBot = AI_BOTS.some((bot) =>
ua.toLowerCase().includes(bot.toLowerCase())
);
if (isBot) {
return new Response('Forbidden', { status: 403 });
}
return context.next();
}
export const config = { path: '/*' };Register it in netlify.toml:
[build]
command = "npx hexo generate"
publish = "public"
[[edge_functions]]
path = "/*"
function = "block-ai-bots"Cloudflare Pages Functions
Create functions/_middleware.ts in your Hexo project root (Cloudflare Pages Functions live outside source/):
import type { PagesFunction } from '@cloudflare/workers-types';
const AI_BOTS = [
'GPTBot', 'ClaudeBot', 'anthropic-ai', 'CCBot',
'Google-Extended', 'AhrefsBot', 'Bytespider',
'Amazonbot', 'Diffbot', 'FacebookBot', 'cohere-ai',
'PerplexityBot', 'YouBot',
];
export const onRequest: PagesFunction = async (context) => {
const ua = context.request.headers.get('user-agent') || '';
const isBot = AI_BOTS.some((bot) =>
ua.toLowerCase().includes(bot.toLowerCase())
);
if (isBot) {
return new Response('Forbidden', { status: 403 });
}
return context.next();
};npx hexo generate and the output directory to public in the Cloudflare Pages dashboard. The functions/ directory is picked up automatically.Vercel Edge Middleware
Create middleware.ts in the Hexo project root:
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
const AI_BOTS = [
'gptbot', 'claudebot', 'anthropic-ai', 'ccbot',
'google-extended', 'ahrefsbot', 'bytespider',
'amazonbot', 'diffbot', 'facebookbot', 'cohere-ai',
'perplexitybot', 'youbot',
];
export function middleware(request: NextRequest) {
const ua = (request.headers.get('user-agent') || '').toLowerCase();
if (AI_BOTS.some((bot) => ua.includes(bot))) {
return new NextResponse('Forbidden', { status: 403 });
}
return NextResponse.next();
}
export const config = {
matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};Deployment quick-reference
| Platform | Build command | Publish dir | Custom headers | Edge functions |
|---|---|---|---|---|
| Netlify | npx hexo generate | public | ✅ netlify.toml | ✅ netlify/edge-functions/ |
| Vercel | npx hexo generate | public | ✅ vercel.json | ⚠️ Next.js required |
| Cloudflare Pages | npx hexo generate | public | ✅ source/_headers | ✅ functions/_middleware.ts |
| GitHub Pages | npx hexo generate --deploy | public | 🚫 not supported | 🚫 not supported |
| Firebase Hosting | npx hexo generate | public | ✅ firebase.json headers | ✅ Cloud Functions |
| AWS S3 + CloudFront | npx hexo generate | public | ✅ CloudFront response policy | ✅ Lambda@Edge |
Full hexo.config.yml excerpt
# _config.yml
url: https://example.com
root: /
# If using hexo-generator-feed (RSS) — note this doesn't affect robots
feed:
type: atom
path: atom.xml
# If using hexo-generator-sitemap
sitemap:
path: sitemap.xml
tag: false
category: falsepackage.json scripts
{
"scripts": {
"build": "hexo generate",
"build:prod": "NODE_ENV=production hexo generate",
"clean": "hexo clean",
"server": "hexo server",
"deploy": "hexo deploy"
}
}FAQ
Does Hexo have built-in robots.txt support?
Yes. Any file placed in source/ is copied to public/ on build — including robots.txt. No plugin or configuration required. Create source/robots.txt and run hexo generate.
How do I add the noai meta tag to every Hexo page?
Edit your theme's layout file (themes/[theme]/layout/_partial/head.ejs or head.njk). Add <meta name="robots" content="noai, noimageai"> inside the <head> block. For per-page override, use front matter (robots: index, noai) and a conditional in the template (<%= page.robots || 'noai, noimageai' %>).
Can I block AI bots with a Hexo plugin?
Yes — write a generator plugin (hexo.extend.generator.register) to programmatically produce robots.txt at build time. Place it in the scripts/ directory in your Hexo root — Hexo auto-loads all .js files in scripts/.
How do I add X-Robots-Tag on Hexo sites?
X-Robots-Tag is an HTTP response header — Hexo generates static files, not a server. Add it at the hosting layer: Netlify [[headers]] in netlify.toml; Vercel headers() in vercel.json; Cloudflare Pages _headers file in source/ (copied to public/).
Does GitHub Pages support X-Robots-Tag for Hexo sites?
No. GitHub Pages does not allow custom HTTP headers. For X-Robots-Tag or hard 403 blocking, migrate to Netlify, Vercel, or Cloudflare Pages — all support custom headers and edge functions.
How do I block AI bots on a Hexo site hosted on Cloudflare Pages?
Two options: (1) Create source/_headers with X-Robots-Tag: noai, noimageai for all paths — Hexo copies it to public/_headers on build. (2) Add a functions/_middleware.ts Cloudflare Pages Function that checks the User-Agent header and returns a 403 for known AI crawlers.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.