Skip to content

How to Block AI Bots in Perl Mojolicious

Mojolicious is a full-stack Perl web framework with a real-time capable non-blocking I/O core. It is widely used in bioinformatics pipelines, enterprise Perl shops, and DevOps tooling that exposes REST APIs. Mojolicious provides a hook system for cross-cutting concerns: before_dispatch fires before routing and static-file serving, making it the right place to block AI crawlers. The Mojolicious-specific detail: calling $c->render() inside a hook marks the transaction as rendered, which tells Mojolicious to skip static dispatch and routing entirely — no route handler is ever called.

1. Bot pattern module

Keep the pattern list in a separate module so it is shared by the hook, any under route guards, and test files. index() is a literal substring search — no regex engine, no backtracking. Apply lc() to the UA once before iterating.

# lib/MyApp/BotFilter.pm
package MyApp::BotFilter;
use strict;
use warnings;

# Pattern list shared by hook, under guard, and tests
my @AI_BOT_PATTERNS = qw(
    gptbot
    chatgpt-user
    claudebot
    anthropic-ai
    ccbot
    google-extended
    cohere-ai
    meta-externalagent
    bytespider
    omgili
    diffbot
    imagesiftbot
    magpie-crawler
    amazonbot
    dataprovider
    netcraft
);

sub is_ai_bot {
    my ($ua) = @_;
    return 0 unless defined $ua && length $ua;
    my $ua_lower = lc $ua;
    # index() — literal substring, no regex overhead
    return 1 if grep { index($ua_lower, $_) >= 0 } @AI_BOT_PATTERNS;
    return 0;
}

1;

2. before_dispatch hook — global protection

Register the hook in startup(). The hook receives the Mojolicious controller $c. Calling $c->render() marks the response as finished; Mojolicious checks this flag after all hooks run and skips dispatch. Always return after rendering to stop the hook sub from falling through.

# lib/MyApp.pm
package MyApp;
use Mojo::Base 'Mojolicious', -signatures;
use MyApp::BotFilter qw(is_ai_bot);

sub startup ($app) {
    # ── AI-bot hook — runs before routing AND static-file dispatch ────────────
    $app->hook(before_dispatch => sub ($c) {
        # Always let AI crawlers read robots.txt so they see they're blocked
        return if $c->req->url->path eq '/robots.txt';

        my $ua = $c->req->headers->user_agent // '';

        if (is_ai_bot($ua)) {
            $c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
            # $c->render() marks the transaction as rendered —
            # Mojolicious will skip static-file serving and routing.
            $c->render(text => 'Forbidden', status => 403);
            return;  # stop hook sub (transaction is already rendered)
        }

        # Pass-through: add header, continue dispatch normally
        $c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
    });

    # ── Routes ────────────────────────────────────────────────────────────────
    my $r = $app->routes;
    $r->get('/')->to('main#index');
    $r->get('/api/data')->to('api#data');
}

1;

3. under route guard — partial protection

Use an under route to protect a specific branch of your routing tree. The callback returns 1 (continue) or 0 (halt). Returning 0 without calling $c->render() first causes Mojolicious to render a default 403, so call $c->render() yourself to control the response body.

# lib/MyApp.pm — under route guard variant
# Use this when you want to protect a subset of routes,
# not every request site-wide.

package MyApp;
use Mojo::Base 'Mojolicious', -signatures;
use MyApp::BotFilter qw(is_ai_bot);

sub startup ($app) {
    my $r = $app->routes;

    # Public — static assets, robots.txt, health check
    $r->get('/robots.txt')->to('static#robots');
    $r->get('/health')->to('health#check');

    # Protected — under callback returns 0 (block) or 1 (allow)
    my $protected = $r->under('/')->to(cb => sub ($c) {
        my $ua = $c->req->headers->user_agent // '';

        if (is_ai_bot($ua)) {
            $c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
            # Returning 0 tells Mojolicious to stop the chain.
            # Mojolicious renders a default 403 unless you call $c->render first.
            $c->render(text => 'Forbidden', status => 403);
            return 0;  # halt chain
        }

        $c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
        return 1;  # continue to child routes
    });

    # Child routes — only reached when under callback returns 1
    $protected->get('/')->to('main#index');
    $protected->get('/api/data')->to('api#data');
    $protected->post('/api/submit')->to('api#submit');
}

1;

4. Plack/PSGI middleware — deployment-layer blocking

Mojolicious apps can be deployed as PSGI applications via to_psgi_app(). In PSGI, headers arrive in the environment hash as HTTP_USER_AGENT — uppercase, hyphens replaced with underscores, HTTP_ prefix (same as the Rook spec used by R Plumber). Plack middleware wraps the app in a closure: return a 3-element arrayref to short-circuit, or call the inner app and mutate its response to pass through.

# app.psgi — Plack/PSGI middleware variant
# Deploy with: plackup app.psgi
# or: hypnotoad myapp.pl (Mojolicious native)

use strict;
use warnings;
use Plack::Builder;
use MyApp;

# Re-use the same pattern list for the Plack layer
my @AI_BOT_PATTERNS = qw(
    gptbot chatgpt-user claudebot anthropic-ai ccbot
    google-extended cohere-ai meta-externalagent bytespider
    omgili diffbot imagesiftbot magpie-crawler amazonbot
    dataprovider netcraft
);

sub is_ai_bot {
    my ($ua) = @_;
    return 0 unless defined $ua && length $ua;
    my $ua_lower = lc $ua;
    return grep { index($ua_lower, $_) >= 0 } @AI_BOT_PATTERNS;
}

my $app = MyApp->new->to_psgi_app;

builder {
    # Plack middleware — PSGI env uses HTTP_USER_AGENT (CGI-style naming)
    enable sub {
        my $inner = shift;
        return sub {
            my $env = shift;

            # Bypass for robots.txt — let bots discover they're blocked
            unless ($env->{PATH_INFO} eq '/robots.txt') {
                my $ua = $env->{HTTP_USER_AGENT} // '';
                if (is_ai_bot($ua)) {
                    return [
                        403,
                        [
                            'Content-Type'  => 'text/plain; charset=utf-8',
                            'X-Robots-Tag'  => 'noai, noimageai',
                        ],
                        ['Forbidden'],
                    ];
                }
            }

            # Pass-through — append header to response
            my $res = $inner->($env);
            push @{$res->[1]}, 'X-Robots-Tag', 'noai, noimageai';
            return $res;
        };
    };

    $app;
};

5. Mojolicious::Lite — single-file apps

Mojolicious::Lite exports the hook keyword at the script level — same behaviour, no class required. Useful for small internal tools or quick prototypes.

# Mojolicious::Lite variant — single-file apps
use Mojolicious::Lite -signatures;
use MyApp::BotFilter qw(is_ai_bot);

hook before_dispatch => sub ($c) {
    return if $c->req->url->path eq '/robots.txt';
    my $ua = $c->req->headers->user_agent // '';
    if (is_ai_bot($ua)) {
        $c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
        $c->render(text => 'Forbidden', status => 403);
        return;
    }
    $c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
};

get '/' => sub ($c) { $c->render(text => 'Hello') };

app->start;

6. public/robots.txt

Mojolicious serves files from the public/ directory automatically — public/robots.txt is the correct location. The before_dispatch path check (return if path eq '/robots.txt') is mandatory because before_dispatch fires before static file serving, not after. Without the bypass, AI crawlers cannot fetch the file that tells them they are disallowed.

# public/robots.txt
# Mojolicious serves public/ as static files — place robots.txt here.
# The before_dispatch path-bypass ensures bots can read this file.

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Key points

Framework comparison — Perl and Ruby web frameworks

FrameworkHook / middlewareShort-circuitUA header
Mojolicioushook before_dispatch$c->render(status=>403); return$c->req->headers->user_agent
Dancer2 (Perl)hook beforesend_error("Forbidden", 403)request->header("User-Agent")
Ruby Railsbefore_actionrender plain: "Forbidden", status: 403request.user_agent
Ruby Sinatrabefore blockhalt 403, "Forbidden"request.user_agent

Mojolicious and Dancer2 (Perl) both use a hook/before keyword pattern. The key difference from Ruby frameworks: Mojolicious requires calling $c->render() explicitly to mark the response — there is no equivalent of Sinatra's halt that automatically stops the pipeline. Dancer2's send_error() is closer to Sinatra's halt in that it throws an exception that Dancer2 catches and renders.

Dependencies

The hook uses only Mojolicious core — no additional CPAN modules required for the application-level approach. The PSGI variant requires Plack, which ships with every Mojolicious installation as a dependency.

# Install from CPAN
cpanm Mojolicious

# Or using carton for reproducible environments
# cpanfile:
requires 'Mojolicious', '>= 9.0';

# PSGI deployment
cpanm Plack
cpanm Plack::Runner  # includes plackup

# Production server (non-blocking, multi-process)
cpanm Mojo::Server::Hypnotoad  # ships with Mojolicious
# Start: hypnotoad script/my_app