How to Block AI Bots in Perl Mojolicious
Mojolicious is a full-stack Perl web framework with a real-time capable non-blocking I/O core. It is widely used in bioinformatics pipelines, enterprise Perl shops, and DevOps tooling that exposes REST APIs. Mojolicious provides a hook system for cross-cutting concerns: before_dispatch fires before routing and static-file serving, making it the right place to block AI crawlers. The Mojolicious-specific detail: calling $c->render() inside a hook marks the transaction as rendered, which tells Mojolicious to skip static dispatch and routing entirely — no route handler is ever called.
1. Bot pattern module
Keep the pattern list in a separate module so it is shared by the hook, any under route guards, and test files. index() is a literal substring search — no regex engine, no backtracking. Apply lc() to the UA once before iterating.
# lib/MyApp/BotFilter.pm
package MyApp::BotFilter;
use strict;
use warnings;
# Pattern list shared by hook, under guard, and tests
my @AI_BOT_PATTERNS = qw(
gptbot
chatgpt-user
claudebot
anthropic-ai
ccbot
google-extended
cohere-ai
meta-externalagent
bytespider
omgili
diffbot
imagesiftbot
magpie-crawler
amazonbot
dataprovider
netcraft
);
sub is_ai_bot {
my ($ua) = @_;
return 0 unless defined $ua && length $ua;
my $ua_lower = lc $ua;
# index() — literal substring, no regex overhead
return 1 if grep { index($ua_lower, $_) >= 0 } @AI_BOT_PATTERNS;
return 0;
}
1;2. before_dispatch hook — global protection
Register the hook in startup(). The hook receives the Mojolicious controller $c. Calling $c->render() marks the response as finished; Mojolicious checks this flag after all hooks run and skips dispatch. Always return after rendering to stop the hook sub from falling through.
# lib/MyApp.pm
package MyApp;
use Mojo::Base 'Mojolicious', -signatures;
use MyApp::BotFilter qw(is_ai_bot);
sub startup ($app) {
# ── AI-bot hook — runs before routing AND static-file dispatch ────────────
$app->hook(before_dispatch => sub ($c) {
# Always let AI crawlers read robots.txt so they see they're blocked
return if $c->req->url->path eq '/robots.txt';
my $ua = $c->req->headers->user_agent // '';
if (is_ai_bot($ua)) {
$c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
# $c->render() marks the transaction as rendered —
# Mojolicious will skip static-file serving and routing.
$c->render(text => 'Forbidden', status => 403);
return; # stop hook sub (transaction is already rendered)
}
# Pass-through: add header, continue dispatch normally
$c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
});
# ── Routes ────────────────────────────────────────────────────────────────
my $r = $app->routes;
$r->get('/')->to('main#index');
$r->get('/api/data')->to('api#data');
}
1;3. under route guard — partial protection
Use an under route to protect a specific branch of your routing tree. The callback returns 1 (continue) or 0 (halt). Returning 0 without calling $c->render() first causes Mojolicious to render a default 403, so call $c->render() yourself to control the response body.
# lib/MyApp.pm — under route guard variant
# Use this when you want to protect a subset of routes,
# not every request site-wide.
package MyApp;
use Mojo::Base 'Mojolicious', -signatures;
use MyApp::BotFilter qw(is_ai_bot);
sub startup ($app) {
my $r = $app->routes;
# Public — static assets, robots.txt, health check
$r->get('/robots.txt')->to('static#robots');
$r->get('/health')->to('health#check');
# Protected — under callback returns 0 (block) or 1 (allow)
my $protected = $r->under('/')->to(cb => sub ($c) {
my $ua = $c->req->headers->user_agent // '';
if (is_ai_bot($ua)) {
$c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
# Returning 0 tells Mojolicious to stop the chain.
# Mojolicious renders a default 403 unless you call $c->render first.
$c->render(text => 'Forbidden', status => 403);
return 0; # halt chain
}
$c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
return 1; # continue to child routes
});
# Child routes — only reached when under callback returns 1
$protected->get('/')->to('main#index');
$protected->get('/api/data')->to('api#data');
$protected->post('/api/submit')->to('api#submit');
}
1;4. Plack/PSGI middleware — deployment-layer blocking
Mojolicious apps can be deployed as PSGI applications via to_psgi_app(). In PSGI, headers arrive in the environment hash as HTTP_USER_AGENT — uppercase, hyphens replaced with underscores, HTTP_ prefix (same as the Rook spec used by R Plumber). Plack middleware wraps the app in a closure: return a 3-element arrayref to short-circuit, or call the inner app and mutate its response to pass through.
# app.psgi — Plack/PSGI middleware variant
# Deploy with: plackup app.psgi
# or: hypnotoad myapp.pl (Mojolicious native)
use strict;
use warnings;
use Plack::Builder;
use MyApp;
# Re-use the same pattern list for the Plack layer
my @AI_BOT_PATTERNS = qw(
gptbot chatgpt-user claudebot anthropic-ai ccbot
google-extended cohere-ai meta-externalagent bytespider
omgili diffbot imagesiftbot magpie-crawler amazonbot
dataprovider netcraft
);
sub is_ai_bot {
my ($ua) = @_;
return 0 unless defined $ua && length $ua;
my $ua_lower = lc $ua;
return grep { index($ua_lower, $_) >= 0 } @AI_BOT_PATTERNS;
}
my $app = MyApp->new->to_psgi_app;
builder {
# Plack middleware — PSGI env uses HTTP_USER_AGENT (CGI-style naming)
enable sub {
my $inner = shift;
return sub {
my $env = shift;
# Bypass for robots.txt — let bots discover they're blocked
unless ($env->{PATH_INFO} eq '/robots.txt') {
my $ua = $env->{HTTP_USER_AGENT} // '';
if (is_ai_bot($ua)) {
return [
403,
[
'Content-Type' => 'text/plain; charset=utf-8',
'X-Robots-Tag' => 'noai, noimageai',
],
['Forbidden'],
];
}
}
# Pass-through — append header to response
my $res = $inner->($env);
push @{$res->[1]}, 'X-Robots-Tag', 'noai, noimageai';
return $res;
};
};
$app;
};5. Mojolicious::Lite — single-file apps
Mojolicious::Lite exports the hook keyword at the script level — same behaviour, no class required. Useful for small internal tools or quick prototypes.
# Mojolicious::Lite variant — single-file apps
use Mojolicious::Lite -signatures;
use MyApp::BotFilter qw(is_ai_bot);
hook before_dispatch => sub ($c) {
return if $c->req->url->path eq '/robots.txt';
my $ua = $c->req->headers->user_agent // '';
if (is_ai_bot($ua)) {
$c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
$c->render(text => 'Forbidden', status => 403);
return;
}
$c->res->headers->header('X-Robots-Tag' => 'noai, noimageai');
};
get '/' => sub ($c) { $c->render(text => 'Hello') };
app->start;6. public/robots.txt
Mojolicious serves files from the public/ directory automatically — public/robots.txt is the correct location. The before_dispatch path check (return if path eq '/robots.txt') is mandatory because before_dispatch fires before static file serving, not after. Without the bypass, AI crawlers cannot fetch the file that tells them they are disallowed.
# public/robots.txt
# Mojolicious serves public/ as static files — place robots.txt here.
# The before_dispatch path-bypass ensures bots can read this file.
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /Key points
- before_dispatch fires first: The hook runs before static-file serving and before routing. Calling
$c->render()in the hook marks the transaction as rendered, causing Mojolicious to skip both static dispatch and the router entirely. - render() + return: Always call
returnafter$c->render()inside a hook. The transaction is finished, but Perl will continue executing the sub unless you explicitly return. - under returns 0 or 1: In an
underroute callback, return0to halt the chain and1to continue. If you return0without calling$c->render(), Mojolicious renders a default 403 — call it yourself to control the response body and headers. - index() over regex:
index($string, $sub)is a literal substring search with no regex engine. For 16 fixed bot patterns, it is faster than a compiled regex and produces no backtracking surprises. - PSGI env = CGI-style names: In PSGI/Plack, headers arrive as
HTTP_USER_AGENTin the$envhash — uppercase, hyphens to underscores,HTTP_prefix. The same convention as R Plumber (Rook spec) and Python's WSGI. - Mojolicious::Lite vs full app: Both share the same hook system. In Lite,
hookis exported as a keyword. In a full app, call$app->hook(before_dispatch => sub ...)insidestartup(). - robots.txt bypass is mandatory:
before_dispatchfires before Mojolicious checks thepublic/directory. Without the path guard, your hook blocks the robots.txt file itself — AI crawlers can never read the disallow rules.
Framework comparison — Perl and Ruby web frameworks
| Framework | Hook / middleware | Short-circuit | UA header |
|---|---|---|---|
| Mojolicious | hook before_dispatch | $c->render(status=>403); return | $c->req->headers->user_agent |
| Dancer2 (Perl) | hook before | send_error("Forbidden", 403) | request->header("User-Agent") |
| Ruby Rails | before_action | render plain: "Forbidden", status: 403 | request.user_agent |
| Ruby Sinatra | before block | halt 403, "Forbidden" | request.user_agent |
Mojolicious and Dancer2 (Perl) both use a hook/before keyword pattern. The key difference from Ruby frameworks: Mojolicious requires calling $c->render() explicitly to mark the response — there is no equivalent of Sinatra's halt that automatically stops the pipeline. Dancer2's send_error() is closer to Sinatra's halt in that it throws an exception that Dancer2 catches and renders.
Dependencies
The hook uses only Mojolicious core — no additional CPAN modules required for the application-level approach. The PSGI variant requires Plack, which ships with every Mojolicious installation as a dependency.
# Install from CPAN
cpanm Mojolicious
# Or using carton for reproducible environments
# cpanfile:
requires 'Mojolicious', '>= 9.0';
# PSGI deployment
cpanm Plack
cpanm Plack::Runner # includes plackup
# Production server (non-blocking, multi-process)
cpanm Mojo::Server::Hypnotoad # ships with Mojolicious
# Start: hypnotoad script/my_app