Skip to content

How to Block AI Bots in Perl Dancer2

Dancer2 is a lightweight Perl web framework designed around a clean DSL and a PSGI-native architecture. It is widely used in enterprise Perl shops, internal tooling, and APIs where Mojolicious feels too heavyweight. Dancer2 provides a hook before keyword that fires before every route handler. The Dancer2-specific detail: short-circuiting in a before hook means calling send_error() — a single call that creates a Dancer2::Core::Error, marks the response as halted, and skips the route handler. This is simpler than Mojolicious (which needs both $c->render() and return) but produces the same outcome.

1. Bot pattern module

Define patterns in a separate module, exported with Exporter. Shared by hooks, route handlers, and tests. index() is a literal substring search — no regex engine overhead. Apply lc() to the UA string once before iterating.

# lib/MyApp/BotFilter.pm
package MyApp::BotFilter;
use strict;
use warnings;
use Exporter 'import';

our @EXPORT_OK = qw(is_ai_bot);

# All lowercase — matched against lc($ua)
my @AI_BOT_PATTERNS = qw(
    gptbot
    chatgpt-user
    claudebot
    anthropic-ai
    ccbot
    google-extended
    cohere-ai
    meta-externalagent
    bytespider
    omgili
    diffbot
    imagesiftbot
    magpie-crawler
    amazonbot
    dataprovider
    netcraft
);

sub is_ai_bot {
    my ($ua) = @_;
    return 0 unless defined $ua && length $ua;
    my $ua_lower = lc $ua;
    # index() — literal substring, no regex engine, no backtracking
    return 1 if grep { index($ua_lower, $_) >= 0 } @AI_BOT_PATTERNS;
    return 0;
}

1;

2. hook before + hook after — global protection

Register before and after hooks in the application. The before hook sets the header on the response object before calling send_error() — headers must be set before the error is raised. The after hook injects X-Robots-Tag into every response that was not blocked.

# lib/MyApp.pm — Dancer2 application
package MyApp;
use Dancer2;
use MyApp::BotFilter qw(is_ai_bot);

# ── AI-bot before hook — runs before every route handler ─────────────────────
hook before => sub {
    # Path guard: include this for safety across all deployment modes.
    # In PSGI deployments with Plack::Middleware::Static, robots.txt is
    # served before Dancer2 — this guard is a no-op there.
    # With the built-in dev server, robots.txt goes through this hook —
    # the guard ensures bots can still read the file and see they're blocked.
    return if request->path eq '/robots.txt';

    my $ua = request->header('User-Agent') // '';  # undef-safe

    if (is_ai_bot($ua)) {
        # Set header on the response object before short-circuiting
        response->header('X-Robots-Tag' => 'noai, noimageai');
        # send_error creates a Dancer2::Core::Error, marks response halted,
        # and skips the route handler — a single call stops everything
        send_error('Forbidden', 403);
    }
    # Pass-through: no explicit action needed here — after hook handles header
};

# ── after hook — add X-Robots-Tag to all passing responses ───────────────────
# Fires after the route handler for requests that were not blocked.
# Blocked requests are halted before reaching the route handler,
# so their header is set in the before hook above.
hook after => sub {
    response->header('X-Robots-Tag' => 'noai, noimageai');
};

# ── Routes ────────────────────────────────────────────────────────────────────
get '/' => sub {
    'Hello'
};

get '/api/data' => sub {
    content_type 'application/json';
    return '{"data":"value"}';
};

# Explicit robots.txt route (optional — public/robots.txt auto-served in most setups)
get '/robots.txt' => sub {
    content_type 'text/plain';
    send_file 'robots.txt', system_path => 1;
};

1;

3. Single-file script variant

Dancer2 also works as a single-file script — the DSL keywords are available at the script level without a class. Call dance at the end to start the server. Useful for quick prototypes or small internal tools.

#!/usr/bin/env perl
# app.pl — Dancer2 script (single-file variant)
use Dancer2;
use strict;
use warnings;

my @AI_BOT_PATTERNS = qw(
    gptbot chatgpt-user claudebot anthropic-ai ccbot
    google-extended cohere-ai meta-externalagent bytespider
    omgili diffbot imagesiftbot magpie-crawler amazonbot
    dataprovider netcraft
);

sub is_ai_bot {
    my ($ua) = @_;
    return 0 unless defined $ua && length $ua;
    my $lower = lc $ua;
    return grep { index($lower, $_) >= 0 } @AI_BOT_PATTERNS;
}

hook before => sub {
    return if request->path eq '/robots.txt';
    my $ua = request->header('User-Agent') // '';
    if (is_ai_bot($ua)) {
        response->header('X-Robots-Tag' => 'noai, noimageai');
        send_error('Forbidden', 403);
    }
};

hook after => sub {
    response->header('X-Robots-Tag' => 'noai, noimageai');
};

get '/' => sub { 'Hello' };

dance;

4. Dancer2 plugin — reusable across apps

Encapsulate the hooks in a Dancer2::Plugin subclass so the bot blocker can be added to any Dancer2 application with a single use statement. The on_plugin_import block runs when the plugin is loaded and registers hooks directly on the app object.

# lib/MyApp/Plugin/BotBlocker.pm — reusable Dancer2 plugin
# Encapsulates the bot-blocker hook for use across multiple Dancer2 apps.
package MyApp::Plugin::BotBlocker;
use Dancer2::Plugin;
use MyApp::BotFilter qw(is_ai_bot);

on_plugin_import {
    my $dsl = shift;

    $dsl->app->add_hook(
        Dancer2::Core::Hook->new(
            name => 'before',
            code => sub {
                return if $dsl->app->request->path eq '/robots.txt';
                my $ua = $dsl->app->request->header('User-Agent') // '';
                if (is_ai_bot($ua)) {
                    $dsl->app->response->header('X-Robots-Tag' => 'noai, noimageai');
                    $dsl->send_error('Forbidden', 403);
                }
            },
        )
    );

    $dsl->app->add_hook(
        Dancer2::Core::Hook->new(
            name  => 'after',
            code  => sub {
                $dsl->app->response->header('X-Robots-Tag' => 'noai, noimageai');
            },
        )
    );
};

register_plugin;
1;

# Usage in any Dancer2 app:
# use MyApp::Plugin::BotBlocker;

5. PSGI deployment with Plack::Middleware::Static

In production, Dancer2 is deployed as a PSGI app via to_app(). Wrapping it with Plack::Middleware::Static serves public/ files at the Plack layer — before Dancer2 handles the request. This means robots.txt is served without triggering Dancer2 hooks at all. The path guard in the before hook becomes a no-op but costs nothing to keep.

# app.psgi — production PSGI deployment
use strict;
use warnings;
use Plack::Builder;
use MyApp;

# Plack::Middleware::Static serves public/ BEFORE Dancer2 handles the request.
# robots.txt is served here — Dancer2 before hooks never fire for it.
# The path guard in the before hook is a no-op, but costs nothing to keep.

builder {
    # Serve public/ directory (includes robots.txt) before Dancer2
    enable 'Static',
        path => qr{^/robots.txt$|^/favicon.ico$|^/assets/},
        root => './public';

    # Optional: gzip compression
    enable 'Deflater',
        content_type => ['text/html', 'application/json'];

    # Dancer2 application
    MyApp->to_app;
};

6. public/robots.txt

Place robots.txt in the public/ directory. In PSGI deployments with Plack::Middleware::Static, it is served at the Plack layer and never reaches Dancer2. With the built-in development server (dancer2 -a MyApp), the path guard in the before hook ensures AI crawlers can still fetch the file.

# public/robots.txt
# Served from public/ — accessible to all crawlers even when blocker is active.

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Key points

Framework comparison — Perl web frameworks

FrameworkHookShort-circuitUA header
Dancer2hook before => subsend_error('Forbidden', 403)request->header('User-Agent')
Mojolicioushook before_dispatch => sub$c->render(status=>403); return$c->req->headers->user_agent
Plack/PSGI (raw)middleware closurereturn [403, [...], ['Forbidden']]$env->{HTTP_USER_AGENT}
Catalystauto action$c->res->status(403); $c->detach()$c->req->header('User-Agent')

Dancer2's send_error() is the most concise short-circuit across Perl frameworks — one call vs Mojolicious's render+return or Catalyst's status+detach. All three frameworks are PSGI-native and support Plack middleware layering, which is where Plack::Middleware::Static can bypass hooks for static file requests entirely.

Dependencies

# Install from CPAN
cpanm Dancer2
cpanm Plack                    # PSGI server runner
cpanm Plack::Middleware::Static  # production static file serving

# cpanfile:
requires 'Dancer2', '>= 1.0.0';
requires 'Plack', '>= 1.0047';

# Run development server
dancer2 -a MyApp

# Run with Plack
plackup app.psgi

# Production (Starman multi-process)
cpanm Starman
starman app.psgi --workers 4 --port 8080