Skip to content
Guides/Koa.js

How to Block AI Bots on Koa.js: Complete 2026 Guide

Koa was built by the Express team to fix Express's callback-era design. It has no built-in routing or middleware — everything goes through app.use() with async/await. Bot blocking takes advantage of Koa's onion model: a single middleware function has two phases — the upstream phase (before await next()) where you block bots, and the downstream phase (after await next()) where you inject response headers.

The onion model

Request →  [Middleware A: upstream]
              [Middleware B: upstream]   ← bot check here
                [Route handler]
              [Middleware B: downstream] ← header injection here
           [Middleware A: downstream]
← Response

Four protection layers

1
robots.txtServe from public/ via koa-static (registered before bot middleware)
2
noai meta tag{{ robots or 'noai, noimageai' }} in your Nunjucks base template via ctx.state
3
X-Robots-Tag headerctx.set() in middleware downstream phase — after await next() returns
4
Hard 403 blockctx.status + ctx.body + return in middleware upstream phase — before await next()

Layer 1: robots.txt

Koa has no built-in static file serving. Add koa-static to serve the public/ directory. Register it before the bot-blocking middleware so /robots.txt is served as a static file before any UA check runs.

# public/robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: Bytespider
User-agent: Applebot-Extended
User-agent: PerplexityBot
User-agent: Diffbot
User-agent: cohere-ai
User-agent: FacebookBot
User-agent: omgili
User-agent: omgilibot
User-agent: Amazonbot
User-agent: DeepSeekBot
User-agent: MistralBot
User-agent: xAI-Bot
User-agent: AI2Bot
Disallow: /
// app.js — static serving before bot middleware
import Koa from 'koa';
import serve from 'koa-static';
import path from 'path';
import { fileURLToPath } from 'url';

const __dirname = path.dirname(fileURLToPath(import.meta.url));
const app = new Koa();

// Register static FIRST — /robots.txt is served before the bot check
app.use(serve(path.join(__dirname, 'public')));

// Bot-blocking middleware registered after static
app.use(aiBotBlocker);

Alternative: Register koa-static after the bot middleware but add /robots.txt to EXEMPT_PATHS in the middleware. Both approaches work — the first is simpler.

Layer 2: noai meta tag

Koa works with any template engine via koa-views. The example uses Nunjucks, but the pattern is the same for EJS, Pug, or Handlebars. Set ctx.state.robots in your route handler — koa-views automatically merges ctx.state into the template context.

Base Nunjucks template

{# views/base.njk #}
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>{% block title %}My App{% endblock %}</title>

  {# AI bot training opt-out. Per-page override: set ctx.state.robots in route. #}
  <meta name="robots" content="{{ robots or 'noai, noimageai' }}">
</head>
<body>
  {% block content %}{% endblock %}
</body>
</html>

Route — default (no override)

// No ctx.state.robots → template defaults to "noai, noimageai"
router.get('/', async (ctx) => {
  await ctx.render('home', { title: 'Home' });
});

Route — per-page override

// Public pages that should be indexed normally:
router.get('/about', async (ctx) => {
  ctx.state.robots = 'index, follow'; // merges into koa-views template context
  await ctx.render('about', { title: 'About' });
});

Layers 3 & 4: bot-blocking middleware

A single async middleware function handles both the upstream block (Layer 4) and the downstream header injection (Layer 3). The split happens around await next().

// middleware/aiBotBlocker.js
const AI_BOT_PATTERNS = [
  'gptbot', 'chatgpt-user', 'oai-searchbot',
  'claudebot', 'anthropic-ai', 'claude-web',
  'google-extended', 'ccbot', 'bytespider',
  'applebot-extended', 'perplexitybot', 'diffbot',
  'cohere-ai', 'facebookbot', 'meta-externalagent',
  'omgili', 'omgilibot', 'amazonbot',
  'deepseekbot', 'mistralbot', 'xai-bot', 'ai2-bot',
];

// Only needed if koa-static is registered AFTER this middleware
const EXEMPT_PATHS = new Set(['/robots.txt', '/sitemap.xml', '/favicon.ico']);

export async function aiBotBlocker(ctx, next) {
  // ── Upstream phase (before route handler) ──────────────────────────

  // Pass through exempt paths (skip if koa-static is registered first)
  if (EXEMPT_PATHS.has(ctx.path)) {
    return next();
  }

  const ua = (ctx.get('User-Agent') || '').toLowerCase();

  // Layer 4: hard block — do NOT call next() for bot requests
  if (AI_BOT_PATTERNS.some(pattern => ua.includes(pattern))) {
    ctx.status = 403;
    ctx.body   = 'Forbidden';
    ctx.set('Content-Type', 'text/plain');
    return; // stop — no downstream middleware, no route handler
  }

  // ── Call next middleware / route handler ────────────────────────────
  await next();

  // ── Downstream phase (after route handler returns) ──────────────────

  // Layer 3: inject X-Robots-Tag on every legitimate response
  ctx.set('X-Robots-Tag', 'noai, noimageai');
}

Key points

  • return (without calling next()) after setting the 403 response — Koa does not throw if you skip next(); it simply stops the middleware chain.
  • ctx.get('User-Agent') is the Koa helper for ctx.request.headers['user-agent'] — case-insensitive header lookup built in.
  • ctx.set() in the downstream phase adds headers to the response after it has been constructed — works for HTML pages, JSON API responses, and error responses alike.
  • Downstream code only runs for requests that passed the bot check (since bots return early). This means the X-Robots-Tag is never set on 403 responses — which is correct.
  • Unlike Express, Koa does not require calling res.end() or res.send() — setting ctx.body is sufficient.

Full app wiring

// app.js
import Koa from 'koa';
import Router from '@koa/router';
import serve from 'koa-static';
import views from 'koa-views';
import path from 'path';
import { fileURLToPath } from 'url';
import { aiBotBlocker } from './middleware/aiBotBlocker.js';

const __dirname = path.dirname(fileURLToPath(import.meta.url));
const app    = new Koa();
const router = new Router();

// 1. Static files first — /robots.txt served before any middleware
app.use(serve(path.join(__dirname, 'public')));

// 2. Template engine — koa-views merges ctx.state into template context
app.use(views(path.join(__dirname, 'views'), { extension: 'njk' }));

// 3. Bot blocker — runs for all non-static requests
app.use(aiBotBlocker);

// 4. Routes
router.get('/', async (ctx) => {
  await ctx.render('home', { title: 'Home' });
});

router.get('/about', async (ctx) => {
  ctx.state.robots = 'index, follow';
  await ctx.render('about', { title: 'About' });
});

app.use(router.routes());
app.use(router.allowedMethods());

app.listen(3000, () => console.log('Listening on :3000'));

package.json dependencies

{
  "type": "module",
  "dependencies": {
    "koa": "^2.15.0",
    "@koa/router": "^12.0.0",
    "koa-static": "^5.0.0",
    "koa-views": "^8.0.0",
    "nunjucks": "^3.2.4"
  }
}

Selective route blocking

To block bots only on specific routes — e.g. an API or premium content — use router.use() with a path prefix instead of the global app.use().

// Apply bot block only to /api/* routes
router.use('/api', aiBotBlocker);

router.get('/api/data', async (ctx) => {
  ctx.body = { message: 'Protected API response' };
});

// Public routes — no bot blocking applied
router.get('/', async (ctx) => {
  await ctx.render('home');
});

Verification

# Layer 1 — robots.txt served by koa-static
curl https://yoursite.com/robots.txt

# Layer 3 — X-Robots-Tag on a normal page
curl -I https://yoursite.com/
# Expected: X-Robots-Tag: noai, noimageai

# Layer 4 — hard 403 on bot user-agent
curl -A "Mozilla/5.0 (compatible; GPTBot/1.0)" -I https://yoursite.com/
# Expected: HTTP/1.1 403 Forbidden

# Confirm robots.txt exempt from hard block
curl -A "GPTBot" -I https://yoursite.com/robots.txt
# Expected: HTTP/1.1 200 OK  (served by koa-static before bot check)

FAQ

What is Koa's onion model and why does it matter for bot blocking?

Koa middleware runs in two phases separated by `await next()`. The upstream phase runs before next() — this is where you check the user-agent and block bots with an early return. The downstream phase runs after next() returns — this is where you inject the X-Robots-Tag header, because the route handler has already built the response. If you return early for a bot, the downstream code never runs, so blocked requests get no X-Robots-Tag. This is correct — there's no point setting a header on a 403 response that the bot won't follow.

Does koa-static intercept the bot-blocking middleware?

If koa-static is registered before the bot middleware (recommended), it serves /robots.txt as a static file and the bot middleware never sees that request. If you register the bot middleware first, add EXEMPT_PATHS with /robots.txt — the middleware calls next() for that path without checking the user-agent. Either approach works. Registering koa-static first is simpler and more explicit.

How do I pass variables to Nunjucks templates for the noai meta tag?

Set ctx.state.robots in your route handler. koa-views automatically merges ctx.state into the Nunjucks template context, so {{ robots or 'noai, noimageai' }} renders correctly. When robots is not set in ctx.state, Nunjucks evaluates it as falsy and the `or` operator returns the fallback. You can also use {{ robots | default('noai, noimageai') }} — both produce the same result.

Can I block bots only on specific routes?

Yes. Use router.use('/prefix', aiBotBlocker) from @koa/router to apply the middleware to a specific route prefix. Routes outside that prefix are unaffected. Alternatively, call the middleware function directly in individual route handlers: router.get('/protected', aiBotBlocker, async (ctx) => { ... }). The selective approach is useful when you want to protect API endpoints or premium content while keeping your public homepage fully accessible.

Why use ctx.get() and ctx.set() instead of accessing headers directly?

ctx.get(headerName) is case-insensitive and returns an empty string (not undefined) when the header is absent — safe to call .toLowerCase() on without null-checking. ctx.set(name, value) is the Koa-idiomatic way to set response headers; it delegates to Node's http.ServerResponse.setHeader() with proper validation. Accessing ctx.request.headers['user-agent'] directly also works, but requires manual lowercasing of the header name since HTTP/1.x header names are case-insensitive.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.