Skip to content
PHP · October CMS · Laravel · Twig

How to Block AI Bots on October CMS

October CMS is a modern PHP CMS built on Laravel — popular with agencies and developers who want WordPress's content editing with Laravel's developer experience. It uses Twig for front-end templates, an INI-based page file format, and a plugin architecture for extending core functionality. Unlike plain Laravel, October has its own routing layer (CMS pages), theme system, and backend editor. This guide covers all four AI bot protection layers for October CMS: robots.txt via static file or CMS page, noai meta in Twig layouts with per-page viewBag properties, X-Robots-Tag and hard 403 via Laravel middleware registered through the plugin system, and server-level Apache/Nginx blocking.

9 min readUpdated April 2026October CMS v3

1. robots.txt

October CMS runs on Laravel, which serves the public/ directory as the document root. Apache and Nginx serve files from public/ directly before any PHP is invoked. This makes public/robots.txt the fastest and simplest option.

Option A: Static file in public/ (recommended)

Create public/robots.txt:

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Diffbot
Disallow: /

# Allow standard search crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Apache and Nginx serve this file before the request reaches October's PHP router — zero CMS overhead.

Option B: Dynamic CMS page

For environment-aware content or multi-site rules, create an October CMS page with the URL /robots.txt. October CMS pages use an INI header section (before ==) and a Twig template section (after ==).

Create themes/mytheme/pages/robots-txt.htm:

title = "Robots"
url = "/robots.txt"
==
{% header "Content-Type: text/plain; charset=utf-8" %}
{% if environment() == 'production' %}
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /
{% else %}
User-agent: *
Disallow: /
{% endif %}
Content-Type is required: October CMS pages output HTML by default. The {% header "Content-Type: text/plain; charset=utf-8" %} Twig tag overrides this before any output is sent. Without it, crawlers may reject the robots.txt because the content type is wrong.
Static file vs CMS page priority: If both public/robots.txt and a CMS page at /robots.txt exist, the static file wins — the web server (Apache/Nginx) serves it before October's PHP router is consulted. Remove the static file when using the CMS page approach.

2. noai meta in Twig layouts

October CMS uses Twig for front-end templates. The base layout file is where you add the noai meta tag for site-wide coverage.

Theme layout

Edit your theme layout (e.g., themes/mytheme/layouts/default.htm):

description = "Default layout"
==
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">

    {# AI bot protection — site-wide default with per-page override #}
    <meta name="robots" content="{{ this.page.viewBag.robots ?? 'noai, noimageai' }}">

    <title>{{ this.page.title }} | {{ this.site.name }}</title>

    {% styles %}
</head>
<body>
    {% page %}

    {% scripts %}
</body>
</html>

The expression {{ this.page.viewBag.robots ?? 'noai, noimageai' }} reads a robots value from the page's viewBag properties (set by editors per page) and falls back to noai, noimageai when none is set.

this.page vs page: In October CMS Twig templates, this.page refers to the current CMS page object. this.page.title is the page title, this.page.viewBag is the viewBag property bag, this.page.url is the page URL. The this object gives you access to the current page, layout, theme, and site context.

3. Per-page control via viewBag

October CMS's viewBag is a special INI section that stores custom page properties editable in the CMS backend. Add a robots property here to give editors per-page control.

Page file with viewBag

title = "Blog Post"
url = "/blog/:slug"
layout = "default"

[viewBag]
robots = "index, follow"
==
<h1>{{ this.page.title }}</h1>
<p>Blog post content here.</p>

The [viewBag] section sits in the INI header (before ==). Properties defined here appear as editable fields in the October backend under the page's Properties tab — no plugin or custom field setup needed.

Accessing viewBag in partials

If you need the robots value in a partial (e.g., a shared head.htm partial), pass it through:

{# In layout or page: #}
{% partial 'head' robots=this.page.viewBag.robots %}

{# In partials/head.htm: #}
<meta name="robots" content="{{ robots ?? 'noai, noimageai' }}">
viewBag in the backend: When you add a property to [viewBag] in a page file, October CMS automatically creates an editable field for it in the backend page editor under Properties → Custom. Editors can set it without touching code — leaving it blank applies the layout default.

4. X-Robots-Tag via plugin middleware

October CMS extends Laravel's middleware system through its plugin architecture. The recommended way to register middleware is via a plugin's boot() method — keeping everything modular.

Create the middleware class

Create plugins/acme/aibots/middleware/AiBotMiddleware.php:

<?php
// plugins/acme/aibots/middleware/AiBotMiddleware.php
namespace Acme\AiBots\Middleware;

use Closure;
use Illuminate\Http\Request;
use Illuminate\Http\Response;

class AiBotMiddleware
{
    private const AI_BOT_PATTERNS = [
        'GPTBot', 'ClaudeBot', 'Claude-Web', 'anthropic-ai',
        'CCBot', 'Google-Extended', 'PerplexityBot', 'Applebot-Extended',
        'Amazonbot', 'meta-externalagent', 'Bytespider', 'Diffbot',
    ];

    private const EXEMPT_PATHS = [
        'robots.txt', 'sitemap.xml', 'favicon.ico',
    ];

    public function handle(Request $request, Closure $next): mixed
    {
        $path = ltrim($request->path(), '/');

        // Allow crawlers to access exempt paths
        if (in_array($path, self::EXEMPT_PATHS, true)) {
            return $next($request);
        }

        $ua = $request->userAgent() ?? '';

        if ($this->isAiBot($ua)) {
            return response('Forbidden', 403);
        }

        $response = $next($request);

        // Add X-Robots-Tag to HTML responses
        $contentType = $response->headers->get('Content-Type', '');
        if (str_contains($contentType, 'text/html')) {
            $response->headers->set('X-Robots-Tag', 'noai, noimageai');
        }

        return $response;
    }

    private function isAiBot(string $ua): bool
    {
        $lower = strtolower($ua);
        foreach (self::AI_BOT_PATTERNS as $pattern) {
            if (str_contains($lower, strtolower($pattern))) {
                return true;
            }
        }
        return false;
    }
}

Register in Plugin.php

Register the middleware in your plugin's Plugin.php:

<?php
// plugins/acme/aibots/Plugin.php
namespace Acme\AiBots;

use Backend;
use System\Classes\PluginBase;
use Acme\AiBots\Middleware\AiBotMiddleware;

class Plugin extends PluginBase
{
    public function pluginDetails(): array
    {
        return [
            'name'        => 'AI Bot Protection',
            'description' => 'Blocks AI training crawlers and adds X-Robots-Tag headers.',
            'author'      => 'Acme',
            'icon'        => 'icon-shield',
        ];
    }

    public function boot(): void
    {
        // Register for all web (front-end) routes
        $this->app['router']->pushMiddlewareToGroup('web', AiBotMiddleware::class);
    }
}
Plugin vs Kernel.php: Registering middleware via Plugin.php keeps your AI bot protection self-contained and portable. You can enable/disable it by toggling the plugin in October's backend. Adding it directly to app/Http/Kernel.php also works but couples it to the application rather than the plugin system.

Logging blocked requests

Add logging to the middleware for monitoring:

use Illuminate\Support\Facades\Log;

if ($this->isAiBot($ua)) {
    Log::warning('AI bot blocked', [
        'ua'   => $ua,
        'path' => $request->path(),
        'ip'   => $request->ip(),
    ]);
    return response('Forbidden', 403);
}

5. Hard 403 blocking

The middleware in Section 4 already includes hard 403 blocking — it checks the User-Agent before calling $next($request) and returns a 403 response early. October CMS (Laravel) never processes the request.

For a lightweight middleware that only blocks (no X-Robots-Tag header addition), use:

public function handle(Request $request, Closure $next): mixed
{
    $path = ltrim($request->path(), '/');

    if (in_array($path, ['robots.txt', 'sitemap.xml', 'favicon.ico'], true)) {
        return $next($request);
    }

    $ua = $request->userAgent() ?? '';
    if ($this->isAiBot($ua)) {
        abort(403, 'Forbidden');
    }

    return $next($request);
}

The abort(403) helper throws an HttpException that Laravel handles via the exception handler — no response object construction needed.

6. Apache and Nginx server-level blocking

Server-level UA blocking is more performant than PHP middleware — the request is rejected without invoking PHP or October at all.

Apache — public/.htaccess

# Add above October CMS's default .htaccess rewrite rules
<IfModule mod_rewrite.c>
    RewriteEngine On

    # Allow crawlers to access essential files
    RewriteRule ^robots.txt$ - [L]
    RewriteRule ^sitemap.xml$ - [L]
    RewriteRule ^favicon.ico$ - [L]

    # Block AI training crawlers
    RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Claude-Web [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} anthropic-ai [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PerplexityBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Applebot-Extended [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Amazonbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} meta-externalagent [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Diffbot [NC]
    RewriteRule .* - [F,L]
</IfModule>

# October CMS default rules below
Options +FollowSymLinks
# ...

Nginx

# In your Nginx server block
map $http_user_agent $is_ai_bot {
    default            0;
    ~*GPTBot           1;
    ~*ClaudeBot        1;
    ~*Claude-Web       1;
    ~*anthropic-ai     1;
    ~*CCBot            1;
    ~*Google-Extended  1;
    ~*PerplexityBot    1;
    ~*Applebot-Extended 1;
    ~*Amazonbot        1;
    ~*meta-externalagent 1;
    ~*Bytespider       1;
    ~*Diffbot          1;
}

server {
    listen 443 ssl;
    server_name example.com;
    root /var/www/mysite/public;
    index index.php;

    # Serve static files (robots.txt, etc.) without bot check
    location ~* .(txt|xml|ico|css|js|png|jpg|woff2)$ {
        try_files $uri =404;
        access_log off;
    }

    # Block AI bots from all other requests
    location / {
        if ($is_ai_bot) {
            return 403 "Forbidden";
        }

        add_header X-Robots-Tag "noai, noimageai" always;
        try_files $uri /index.php$is_args$args;
    }

    location ~ \.php$ {
        fastcgi_pass unix:/run/php/php8.3-fpm.sock;
        fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
        include fastcgi_params;
    }
}

7. Deployment

October CMS runs on PHP 8.1+ with MySQL or PostgreSQL. Standard deployment is Apache or Nginx with PHP-FPM. All protection layers work across all standard PHP hosting environments.

Platformrobots.txtMeta tagsX-Robots-TagHard 403
Apache + PHP-FPM
Nginx + PHP-FPM
Laravel Forge / Ploi
DigitalOcean / Linode
Shared hosting (cPanel)✓ (.htaccess)✓ (.htaccess)

Recommended combination

# Priority order for comprehensive protection:
1. public/robots.txt         — Disallow rules (zero PHP, served by webserver)
2. Nginx map + 403           — Server-level UA block (no PHP invoked)
3. Twig layout meta tag      — noai in <head> (crawlers that ignore server config)
4. Laravel middleware        — X-Robots-Tag on HTML responses

FAQ

How do I serve robots.txt in October CMS?

Place robots.txt in public/ — Apache and Nginx serve it directly before PHP is invoked. For dynamic content, create a CMS page with url = "/robots.txt" in the INI section and add {% header "Content-Type: text/plain; charset=utf-8" %} at the top of the Twig section. The static file takes priority over the CMS page — remove it when using the dynamic approach.

How do I add noai meta tags to every October CMS page?

Edit your theme layout and add <meta name="robots" content="{{ this.page.viewBag.robots ?? 'noai, noimageai' }}"> inside <head>. For per-page control, add [viewBag] to the page INI header with robots = "index, follow". Editors can set this in the October backend without touching code.

How do I register Laravel middleware in October CMS?

Override the boot() method in your plugin's Plugin.php and call $this->app['router']->pushMiddlewareToGroup('web', AiBotMiddleware::class). This registers it for all front-end (web) routes. Alternatively, add it directly to app/Http/Kernel.php, but the plugin approach keeps it modular.

What are October CMS viewBag properties?

viewBag is a special INI section in October page files that stores custom properties editable in the CMS backend. Add [viewBag] before the == separator and define key/value pairs. Access them in Twig as {{ this.page.viewBag.key }}. They appear as editable fields in the backend page editor under Properties, letting non-technical editors set values like robots tags per page.

How is October CMS different from plain Laravel for bot blocking?

October has its own routing layer (CMS pages) that runs before Laravel routes. Static files in public/ bypass both routers entirely — the web server serves them. Middleware registration is best done via Plugin.php boot() rather than Kernel.php. Twig templates replace Blade for front-end views, so meta tag syntax uses {{ }} instead of {{ }} Blade syntax.

Does blocking AI bots affect Googlebot?

No. The middleware and server configurations target AI training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended, PerplexityBot) — separate user agents from Googlebot and Bingbot. Standard search engine crawlers are not affected. Include explicit Allow: / rules for Googlebot and Bingbot in your robots.txt.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.