How to Block AI Bots on CakePHP: Complete 2026 Guide
CakePHP 4 and 5 implement PSR-15 middleware — the same process(request, handler) interface as Slim 4. The difference is how you register it: CakePHP uses Application::middleware() with a MiddlewareQueue object. Use prepend() — not add() — to run the bot blocker before CakePHP's own stack.
Always use prepend() — not add()
CakePHP's built-in middleware stack (Security, CSRF, Session, Routing) processes requests in order. add() appends your middleware after the built-in stack — sessions get started and CSRF tokens get validated before the bot is blocked. Use prepend() to run the bot blocker first, before any of that processing.
Protection layers
Layer 1: robots.txt
CakePHP's document root is webroot/ (not public/). Place robots.txt there — Apache and nginx serve it directly without invoking PHP:
# webroot/robots.txt User-agent: * Allow: / User-agent: GPTBot User-agent: ClaudeBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: CCBot User-agent: cohere-ai User-agent: Bytespider User-agent: Amazonbot User-agent: PerplexityBot User-agent: YouBot User-agent: Diffbot User-agent: DeepSeekBot User-agent: MistralBot User-agent: xAI-Bot User-agent: AI2Bot Disallow: /
Slim 4, Laravel, and Symfony use
public/ as the document root. CakePHP uses webroot/. Put robots.txt in webroot/robots.txt.The middleware class
Create src/Middleware/AiBotBlocker.php. CakePHP uses PSR-15 MiddlewareInterface — identical to Slim 4:
<?php
// src/Middleware/AiBotBlocker.php
declare(strict_types=1);
namespace App\Middleware;
use Psr\Http\Message\ResponseInterface;
use Psr\Http\Message\ServerRequestInterface;
use Psr\Http\Server\MiddlewareInterface;
use Psr\Http\Server\RequestHandlerInterface;
use Cake\Http\Response;
class AiBotBlocker implements MiddlewareInterface
{
private const AI_BOTS = [
'gptbot', 'chatgpt-user', 'claudebot', 'anthropic-ai',
'ccbot', 'cohere-ai', 'bytespider', 'amazonbot',
'applebot-extended', 'perplexitybot', 'youbot', 'diffbot',
'google-extended', 'deepseekbot', 'mistralbot', 'xai-bot',
'ai2bot', 'oai-searchbot', 'duckassistbot',
];
private const EXEMPT_PATHS = [
'/robots.txt',
'/sitemap.xml',
'/favicon.ico',
];
public function process(
ServerRequestInterface $request,
RequestHandlerInterface $handler
): ResponseInterface {
// Set noai meta attribute for templates
$request = $request->withAttribute('robots', 'noai, noimageai');
// Exempt paths bypass blocking
$path = $request->getUri()->getPath();
if (in_array($path, self::EXEMPT_PATHS, true)) {
return $this->passThrough($handler, $request);
}
// Check User-Agent
$ua = strtolower($request->getHeaderLine('User-Agent'));
foreach (self::AI_BOTS as $bot) {
if (str_contains($ua, $bot)) {
// Block — do NOT call $handler->handle()
return (new Response())
->withStatus(403)
->withStringBody('Forbidden: AI crawlers are not permitted.');
}
}
return $this->passThrough($handler, $request);
}
private function passThrough(
RequestHandlerInterface $handler,
ServerRequestInterface $request
): ResponseInterface {
// PSR-7 IMMUTABILITY: withHeader() returns a NEW object — must capture it
$response = $handler->handle($request);
return $response->withHeader('X-Robots-Tag', 'noai, noimageai');
}
}PSR-7 immutability — the #1 gotcha
PSR-7 response objects are immutable. withHeader() returns a new object — it does not modify the existing response. The header is silently discarded if you don't capture the return value:
// ❌ WRONG — withHeader() return value discarded
$response->withHeader('X-Robots-Tag', 'noai, noimageai');
return $response; // Header NOT set
// ✅ CORRECT — capture the new object
$response = $response->withHeader('X-Robots-Tag', 'noai, noimageai');
return $response;Registration in Application.php
Register the middleware in src/Application.php using prepend() inside the middleware() method:
<?php
// src/Application.php
namespace App;
use App\Middleware\AiBotBlocker;
use Cake\Http\BaseApplication;
use Cake\Http\MiddlewareQueue;
use Cake\Routing\Middleware\AssetMiddleware;
use Cake\Routing\Middleware\RoutingMiddleware;
use Cake\Http\Middleware\CsrfProtectionMiddleware;
use Cake\Http\Middleware\HttpsEnforcerMiddleware;
use Cake\Http\Middleware\SecurityHeadersMiddleware;
class Application extends BaseApplication
{
public function middleware(MiddlewareQueue $middlewareQueue): MiddlewareQueue
{
$middlewareQueue
// ✅ PREPEND — runs BEFORE CakePHP's built-in middleware
->prepend(new AiBotBlocker())
// CakePHP's standard middleware stack (runs after bot blocker)
->add(new SecurityHeadersMiddleware())
->add(new HttpsEnforcerMiddleware())
->add(new AssetMiddleware([
'cacheTime' => env('ASSET_CACHE_TIME', '+1 day'),
]))
->add(new RoutingMiddleware($this))
->add(new CsrfProtectionMiddleware([
'httponly' => true,
]));
return $middlewareQueue;
}
}prepend()— adds to the beginning of the queue (runs first)add()— appends to the end of the queue (runs last)- There is also
insertAt(position, middleware)for precise placement
Layer 2: noai meta tag
The middleware sets a robots attribute on the request via withAttribute(). Read it in your CakePHP layout template:
<!-- templates/layout/default.php -->
<?php
// Access the request attribute — default to noai if not set
$robots = $this->request->getAttribute('robots', 'noai, noimageai');
?>
<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="<?= h($robots) ?>">
<!-- rest of head -->
</head>Override per-controller or per-action if needed:
// In a controller action — allow indexing for public pages
public function index(): ?Response
{
// Override the robots attribute for this page only
$this->request = $this->request->withAttribute('robots', 'index, follow');
// ...
}Route-scoped middleware
To block AI bots only on specific routes (e.g., your API), apply middleware in config/routes.php using a route scope with the middleware option, or in a Plugin's routes() method:
<?php
// config/routes.php
use App\Middleware\AiBotBlocker;
use Cake\Routing\RouteBuilder;
return static function (RouteBuilder $routes): void {
// Apply bot blocker only to /api/* routes
$routes->scope('/api', function (RouteBuilder $builder): void {
$builder->registerMiddleware('aiBotBlocker', new AiBotBlocker());
$builder->applyMiddleware('aiBotBlocker');
$builder->get('/data', ['controller' => 'Api', 'action' => 'data']);
// ... more API routes
});
// Public routes — no bot blocking
$routes->scope('/', function (RouteBuilder $builder): void {
$builder->get('/', ['controller' => 'Pages', 'action' => 'index']);
});
};Route-scoped middleware requires RoutingMiddleware to be in the global queue (it is, by default). The route-scope middleware runs after routing resolves the matched scope.
CakePHP 4 vs CakePHP 5
CakePHP 4 — PHP 7.4+ / 8.x
// src/Application.php (CakePHP 4)
// Identical pattern — prepend() and PSR-15 work the same
$middlewareQueue->prepend(new AiBotBlocker());
// str_contains() requires PHP 8.0 — use strpos() for PHP 7.4:
// if (strpos($ua, $bot) !== false) { ... }CakePHP 5 — PHP 8.1+ required
// src/Application.php (CakePHP 5) // Same pattern — no changes needed $middlewareQueue->prepend(new AiBotBlocker()); // PHP 8.1+ guaranteed — str_contains() is safe // CakePHP 5 also uses named arguments — no impact on middleware
CakePHP vs Slim 4 vs Symfony — registration comparison
CakePHP — MiddlewareQueue::prepend()
// src/Application.php
public function middleware(MiddlewareQueue $middlewareQueue): MiddlewareQueue
{
$middlewareQueue->prepend(new AiBotBlocker());
// ... built-in middleware
return $middlewareQueue;
}Slim 4 — $app->addMiddleware()
// index.php or routes.php $app->addMiddleware(new AiBotBlocker()); // Slim adds middleware in LIFO — last added = outermost = runs first
Symfony — EventSubscriber on KernelEvents::REQUEST
// config/services.yaml auto-registers via autoconfigure: true // src/EventSubscriber/AiBotBlockerSubscriber.php // KernelEvents::REQUEST at priority 9999 — highest priority runs first
Laravel — $middleware[] in bootstrap/app.php
// bootstrap/app.php (Laravel 11)
->withMiddleware(function (Middleware $middleware) {
$middleware->prepend(AiBotBlocker::class);
})Testing
Use Cake\TestSuite\IntegrationTestTrait and its configRequest() method to send custom headers:
<?php
// tests/TestCase/Middleware/AiBotBlockerTest.php
namespace App\Test\TestCase\Middleware;
use Cake\TestSuite\IntegrationTestTrait;
use Cake\TestSuite\TestCase;
class AiBotBlockerTest extends TestCase
{
use IntegrationTestTrait;
public function testBlocksAiBot(): void
{
$this->configRequest([
'headers' => ['User-Agent' => 'GPTBot/1.0'],
]);
$this->get('/articles');
$this->assertResponseCode(403);
}
public function testAllowsBrowser(): void
{
$this->configRequest([
'headers' => ['User-Agent' => 'Mozilla/5.0 (compatible)'],
]);
$this->get('/articles');
$this->assertResponseOk();
$this->assertHeaderContains('X-Robots-Tag', 'noai, noimageai');
}
public function testRobotsTxtExempt(): void
{
// robots.txt served by web server — test via direct file check
$this->assertFileExists(WWW_ROOT . 'robots.txt');
}
}Run with vendor/bin/phpunit tests/TestCase/Middleware/.
AI bot User-Agent strings (2026)
Use strtolower() before checking and lowercase the list — match case-insensitively. str_contains($ua, $bot) (PHP 8.0+) or strpos($ua, $bot) !== false for PHP 7.4.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.