Skip to content
Guides/Sails.js (Node.js)

How to Block AI Bots on Sails.js (Node.js): Complete 2026 Guide

Sails.js is the Node.js MVC framework built on Express — it adds policies (access control functions that run before controller actions) on top of Express middleware. Unlike raw Express app.use(), Sails policies are configured per-controller and per-action in config/policies.js.

Policies vs HTTP middleware — use both

HTTP middleware (in config/http.js) runs at the Express layer — before routing, before policies. Use it for the earliest, broadest blocking. Policies (in api/policies/) run after routing but before controller actions — use them for per-controller or per-action control. For AI bot blocking, either works; HTTP middleware is marginally more efficient.

Protection layers

1
robots.txtassets/robots.txt — auto-served by Sails' built-in Express static middleware
2
noai meta tagSet res.locals.robots in middleware/policy — EJS/Pug layout reads it
3
X-Robots-Tag headerres.set("X-Robots-Tag", "noai, noimageai") in config/http.js middleware
4
Hard 403 blockreturn res.forbidden() in policy or res.status(403).send() in middleware — action never runs

Layer 1: robots.txt

Place robots.txt in the assets/ directory. Sails compiles assets into .tmp/public/ and serves them via Express static middleware (runs before policies):

# assets/robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Google-Extended
User-agent: CCBot
User-agent: cohere-ai
User-agent: Bytespider
User-agent: Amazonbot
User-agent: PerplexityBot
User-agent: YouBot
User-agent: Diffbot
User-agent: DeepSeekBot
User-agent: MistralBot
User-agent: xAI-Bot
User-agent: AI2Bot
Disallow: /
assets/ — not public/
Sails uses assets/ (compiled to .tmp/public/). Express uses public/. Static files bypass policies entirely.

Approach 1: Policy (idiomatic Sails)

Create api/policies/isNotAiBot.js. Call next() to continue or return a response to block:

// api/policies/isNotAiBot.js
module.exports = function isNotAiBot(req, res, next) {
  const AI_BOTS = [
    'gptbot', 'chatgpt-user', 'claudebot', 'anthropic-ai',
    'ccbot', 'cohere-ai', 'bytespider', 'amazonbot',
    'applebot-extended', 'perplexitybot', 'youbot', 'diffbot',
    'google-extended', 'deepseekbot', 'mistralbot', 'xai-bot',
    'ai2bot', 'oai-searchbot', 'duckassistbot',
  ];

  // Set noai meta for templates
  res.locals.robots = 'noai, noimageai';

  const ua = (req.headers['user-agent'] || '').toLowerCase();
  const isBot = AI_BOTS.some(bot => ua.includes(bot));

  if (isBot) {
    return res.status(403).send('Forbidden: AI crawlers are not permitted.');
  }

  // Set X-Robots-Tag on all legitimate responses
  res.set('X-Robots-Tag', 'noai, noimageai');
  return next();
};

Register globally in config/policies.js:

// config/policies.js
module.exports.policies = {
  // Apply to ALL controller actions globally
  '*': ['isNotAiBot'],

  // Exempt specific controllers if needed:
  // 'HealthController': { '*': true },  // true = no policies
};

Controller-scoped policies

To block only on API controllers, apply per-controller instead of globally:

// config/policies.js
module.exports.policies = {
  // Only block on API controller
  'api/*': ['isNotAiBot'],

  // Or per-action:
  'ArticleController': {
    'find': ['isNotAiBot'],
    'findOne': ['isNotAiBot'],
    'create': ['isAuthenticated'],  // different policy
  },

  // Public pages — no bot blocking
  'PageController': { '*': true },
};

Approach 2: HTTP middleware (Express layer)

For the earliest possible blocking (before Sails routing), add a custom middleware in config/http.js:

// config/http.js
module.exports.http = {
  middleware: {
    // Add your custom middleware to the order array
    order: [
      'aiBotBlocker',  // ← BEFORE bodyParser, session, router
      'cookieParser',
      'session',
      'bodyParser',
      'compress',
      'poweredBy',
      'router',
      'www',
      'favicon',
    ],

    aiBotBlocker: function (req, res, next) {
      const AI_BOTS = [
        'gptbot', 'chatgpt-user', 'claudebot', 'anthropic-ai',
        'ccbot', 'cohere-ai', 'bytespider', 'amazonbot',
        'applebot-extended', 'perplexitybot', 'youbot', 'diffbot',
        'google-extended', 'deepseekbot', 'mistralbot', 'xai-bot',
        'ai2bot', 'oai-searchbot', 'duckassistbot',
      ];

      const EXEMPT_PATHS = ['/robots.txt', '/sitemap.xml', '/favicon.ico'];

      // Exempt paths bypass blocking
      if (EXEMPT_PATHS.includes(req.path)) {
        return next();
      }

      const ua = (req.headers['user-agent'] || '').toLowerCase();
      if (AI_BOTS.some(bot => ua.includes(bot))) {
        return res.status(403).send('Forbidden: AI crawlers are not permitted.');
      }

      // X-Robots-Tag on all legitimate responses
      res.set('X-Robots-Tag', 'noai, noimageai');
      return next();
    },
  },
};
middleware.order — position matters
Place aiBotBlocker first in the order array — before bodyParser and session. This rejects bots before any body parsing or session creation (same reason CakePHP uses prepend()).

Layer 2: noai meta tag

res.locals passes data to views in Sails (EJS, Pug, etc.). Set it in the policy or middleware, read in your layout:

<!-- views/layouts/layout.ejs -->
<head>
  <meta name="robots" content="<%= locals.robots || 'noai, noimageai' %>">
</head>

<!-- Override per-action in a controller: -->
<!-- res.locals.robots = 'index, follow'; -->

Sails.js vs Express vs NestJS — Node.js comparison

Sails.js — policy (returns res.forbidden() or calls next())

// api/policies/isNotAiBot.js
module.exports = function (req, res, next) {
  const ua = (req.headers['user-agent'] || '').toLowerCase();
  if (AI_BOTS.some(bot => ua.includes(bot)))
    return res.status(403).send('Forbidden');
  return next(); // continue to action
};

Express — app.use() middleware

// Direct Express middleware
app.use((req, res, next) => {
  const ua = (req.headers['user-agent'] || '').toLowerCase();
  if (AI_BOTS.some(bot => ua.includes(bot)))
    return res.status(403).send('Forbidden');
  next();
});

NestJS — Guard with @UseGuards()

@Injectable()
export class AiBotGuard implements CanActivate {
  canActivate(context: ExecutionContext): boolean {
    const req = context.switchToHttp().getRequest();
    const ua = (req.headers['user-agent'] || '').toLowerCase();
    if (AI_BOTS.some(b => ua.includes(b)))
      throw new ForbiddenException('AI crawlers blocked');
    return true;
  }
}

Testing

Use supertest with sails.lift() in your test setup:

// test/integration/policies/isNotAiBot.test.js
const sails = require('sails');
const request = require('supertest');

describe('AI Bot Blocking', () => {
  before((done) => sails.lift({ log: { level: 'silent' } }, done));
  after((done) => sails.lower(done));

  it('blocks AI bots with 403', async () => {
    await request(sails.hooks.http.app)
      .get('/api/articles')
      .set('User-Agent', 'GPTBot/1.0')
      .expect(403);
  });

  it('allows normal browsers', async () => {
    const res = await request(sails.hooks.http.app)
      .get('/api/articles')
      .set('User-Agent', 'Mozilla/5.0 (compatible)')
      .expect(200);
    expect(res.headers['x-robots-tag']).toBe('noai, noimageai');
  });

  it('serves robots.txt to bots', async () => {
    await request(sails.hooks.http.app)
      .get('/robots.txt')
      .set('User-Agent', 'GPTBot/1.0')
      .expect(200);
  });
});

AI bot User-Agent strings (2026)

GPTBotChatGPT-UserClaudeBotanthropic-aiCCBotcohere-aiBytespiderAmazonbotApplebot-ExtendedPerplexityBotYouBotDiffbotGoogle-ExtendedFacebookBotomgiliomgilibotDeepSeekBotMistralBotxAI-BotAI2Bot

Access via req.headers['user-agent'] — lowercase with .toLowerCase() before matching with .includes().

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.