Skip to content
Guides/Crystal Kemal

How to Block AI Bots on Crystal Kemal: Complete 2026 Guide

Kemal is a Sinatra-inspired web framework for Crystal — a Ruby-like language that compiles to native binaries. Filters run before route handlers: before_all fires on every request. The key mechanism: halt(env, status_code: 403) stops the request — but after_all filters are not called after a halt. Set headers before halting.

Set headers BEFORE halt — after_all is skipped

halt() raises a Kemal::Exceptions::CustomException internally. After it fires, Kemal sends the response immediately and skips all subsequent filters including after_all. This differs from Sinatra where after filters run even after halt. Always set X-Robots-Tag and Content-Type in your before_all block before calling halt.

Protection layers

1
robots.txtstatic_folder "public" — static handler runs before filters; AI bots can always read it
2
noai meta tagIn HTML heredoc strings or ECR template files — in <head> section
3
X-Robots-Tag headerSet in before_all (for blocked + legitimate) or after_all (legitimate only) — see halt warning above
4
Hard 403 — before_all (global)halt(env, status_code: 403) — route handler never runs. Fires before every route.
5
Hard 403 — before_all "/path/*" (scoped)Kemal supports glob path patterns in filters — only /api/* is protected

Step 1 — Bot list (src/ai_bots.cr)

A Crystal constant array — allocated at program start. String#includes? for substring matching, Array#any? for short-circuit evaluation. The method signature ai_bot?(ua : String?) accepts both String and Nil — Crystal's union type.

# src/ai_bots.cr — shared bot detection module

module AiBots
  PATTERNS = [
    # OpenAI
    "gptbot", "chatgpt-user", "oai-searchbot",
    # Anthropic
    "claudebot", "claude-web",
    # Common Crawl
    "ccbot",
    # Bytedance
    "bytespider",
    # Meta
    "meta-externalagent",
    # Perplexity
    "perplexitybot",
    # Google AI
    "google-extended", "googleother",
    # Cohere
    "cohere-ai",
    # Amazon
    "amazonbot",
    # Diffbot
    "diffbot",
    # AI2
    "ai2bot",
    # DeepSeek
    "deepseekbot",
    # Mistral
    "mistralai-user",
    # xAI
    "xai-bot",
    # You.com
    "youbot",
    # DuckDuckGo AI
    "duckassistbot",
  ]

  # Crystal nil-safety: ua is String? (String or Nil)
  def self.ai_bot?(ua : String?) : Bool
    return false if ua.nil? || ua.empty?
    lower = ua.downcase
    PATTERNS.any? { |pattern| lower.includes?(pattern) }
  end
end

Step 2 — Global before_all filter

The next keyword skips the rest of the filter for specific paths. env.request.headers["User-Agent"]? (with ?) is the nil-safe accessor — returns nil if absent, never raises.

# src/app.cr — before_all filter for global bot blocking

require "kemal"
require "./ai_bots"

# Serve static files from public/ — includes robots.txt
# static_folder runs BEFORE filters, so /robots.txt is always accessible.
static_folder "public"

# before_all: runs before EVERY route handler
# Set headers BEFORE calling halt — after_all is NOT called after halt.
before_all do |env|
  # Skip the bot check for paths that must stay open
  next if env.request.path == "/robots.txt"
  next if env.request.path == "/health"

  # env.request.headers["User-Agent"]? returns String? (nil if absent)
  # The ? suffix is Crystal nil-safe hash access — never raises KeyError.
  ua = env.request.headers["User-Agent"]? || ""

  if AiBots.ai_bot?(ua)
    # Set headers BEFORE halt — they won't be set after halt raises.
    env.response.content_type = "text/plain; charset=utf-8"
    env.response.headers["X-Robots-Tag"] = "noai, noimageai"
    # halt raises Kemal::Exceptions::CustomException — route block never runs.
    halt env, status_code: 403, response: "Forbidden"
  end

  # For legitimate requests, add X-Robots-Tag here.
  # (after_all is skipped for halted requests, so we set it in before_all too)
  env.response.headers["X-Robots-Tag"] = "noai, noimageai"
end

# Route handlers — only reached for legitimate requests
get "/" do |env|
  env.response.content_type = "text/html; charset=utf-8"
  <<-HTML
  <!DOCTYPE html>
  <html>
  <head>
    <meta name="robots" content="noai, noimageai">
    <title>My Site</title>
  </head>
  <body><h1>Welcome</h1></body>
  </html>
  HTML
end

get "/health" do
  "ok"
end

get "/api/data" do |env|
  env.response.content_type = "application/json"
  {data: "protected"}.to_json
end

Kemal.run

Step 3 — Scoped filtering (before_get, before_all "/api/*")

Kemal supports method-specific filters and glob path patterns. Combine to protect only specific HTTP methods and path prefixes. The /api/* path filter only fires for routes matching that prefix.

# Per-route and per-method filter variants

require "kemal"
require "./ai_bots"

static_folder "public"

# before_get: only runs before GET handlers
# before_post, before_put, before_delete, before_patch also available
before_get do |env|
  next if env.request.path == "/robots.txt"
  next if env.request.path == "/health"

  ua = env.request.headers["User-Agent"]? || ""
  if AiBots.ai_bot?(ua)
    env.response.headers["X-Robots-Tag"] = "noai, noimageai"
    halt env, status_code: 403, response: "Forbidden"
  end
  env.response.headers["X-Robots-Tag"] = "noai, noimageai"
end

# Path-scoped filter — only fires for /api/* routes
before_all "/api/*" do |env|
  ua = env.request.headers["User-Agent"]? || ""
  if AiBots.ai_bot?(ua)
    env.response.content_type = "application/json"
    env.response.headers["X-Robots-Tag"] = "noai, noimageai"
    halt env, status_code: 403, response: %({"error":"Forbidden"})
  end
end

# Public routes — no filter applied for /health
get "/health" do
  "ok"
end

# Protected API routes — filtered by before_all "/api/*"
get "/api/data" do |env|
  env.response.content_type = "application/json"
  {data: "protected"}.to_json
end

Kemal.run

Step 4 — Using after_all for response headers

after_all only runs for legitimate (non-halted) requests. For blocked requests, headers must be set in before_all before calling halt. This pattern avoids duplicating the header for legitimate requests.

# after_all for X-Robots-Tag (legitimate requests only)
# Note: after_all does NOT run for halted requests.
# For blocked requests, set the header in before_all BEFORE halt.

require "kemal"
require "./ai_bots"

static_folder "public"

before_all do |env|
  next if env.request.path == "/robots.txt"
  ua = env.request.headers["User-Agent"]? || ""

  if AiBots.ai_bot?(ua)
    # Must set headers here — after_all won't fire after halt
    env.response.headers["X-Robots-Tag"] = "noai, noimageai"
    env.response.content_type = "text/plain"
    halt env, status_code: 403, response: "Forbidden"
  end
end

# after_all only runs for NON-halted requests (legitimate traffic)
after_all do |env|
  env.response.headers["X-Robots-Tag"] = "noai, noimageai"
end

get "/" do
  "Welcome"
end

Kemal.run

Step 5 — robots.txt via static_folder

static_folder "public" serves all files in public/ at their path. Kemal's static file handler runs before filters — so robots.txt is always accessible, even to AI bots that would otherwise be blocked.

# public/robots.txt — served automatically by static_folder "public"
# Place this file in your public/ directory.
# Kemal's static_folder handler runs before filters — so AI bots
# can always read robots.txt even if they'd be blocked on other routes.

User-agent: *
Allow: /

# AI training bots — blocked
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: AmazonBot
Disallow: /

User-agent: Diffbot
Disallow: /


# shard.yml — Crystal dependencies
# name: my-app
# dependencies:
#   kemal:
#     github: kemalcr/kemal
#     version: "~> 1.6"

Step 6 — noai meta tag in HTML responses

# noai meta tag in Kemal HTML responses

require "kemal"
require "ecr"  # Crystal's Embedded Crystal (ERB-like) template engine

# Option A: Inline heredoc
get "/" do |env|
  env.response.content_type = "text/html; charset=utf-8"
  <<-HTML
  <!DOCTYPE html>
  <html>
  <head>
    <meta name="robots" content="noai, noimageai">
    <title>My Site</title>
  </head>
  <body><h1>Welcome</h1></body>
  </html>
  HTML
end

# Option B: ECR template file (views/index.ecr)
# In views/index.ecr:
#   <meta name="robots" content="noai, noimageai">
#
# In route handler:
# get "/" do |env|
#   render "views/index.ecr"
# end

# The X-Robots-Tag header set in before_all/after_all covers non-HTML
# responses (JSON, binary). The meta tag covers scrapers that parse HTML.

Kemal vs Lucky vs Amber vs Ruby Sinatra

FeatureKemal (Crystal)Lucky (Crystal)Amber (Crystal)Sinatra (Ruby)
Filter typebefore_all / before_get / before_all "/path/*" blocksPipes — Lucky::Action pipe with call method, include in actionsbefore_action filter block at controller levelbefore filter block (Ruby Sinatra) — same concept, different syntax
Short-circuithalt(env, status_code: 403) — raises exception, route block never runsreturn redirect_to ... or custom response — halts action pipelinerespond_with { ... } and return — stops before_action chainhalt 403, "Forbidden" — identical concept to Kemal (Kemal is Sinatra-inspired)
after_* filters post-haltNOT called after halt — set headers BEFORE halt in before_allafter pipes run regardless — safe to set headers thereafter_action does not run if before_action haltsafter filter runs even after halt — safe to set response headers
UA header accessenv.request.headers["User-Agent"]? || "" (nil-safe Crystal)context.request.headers["User-Agent"]?.try(&.first?) || ""request.headers["User-Agent"]? (Amber wraps Crystal HTTP::Request)request.env["HTTP_USER_AGENT"] || "" (Rack env)
robots.txtstatic_folder "public" — serves public/robots.txt before filters runLucky::StaticFileHandler or explicit action routeAmber static middleware or explicit routeenable :static; set :public_folder, "public" (Rack::Static)
CompiledYes — Crystal compiles to native binary, no GC pausesYes — Crystal compiled, with Lucky's type-safe routingYes — Crystal compiledNo — Ruby interpreted, MRI GC
Nil safetyCrystal: String? type, [] raises KeyError, []? returns nilSame Crystal nil safety via Lucky wrappersSame Crystal nil safetyRuby: nil is valid, no nil safety enforcement

Summary

  • Set headers BEFORE halt after_all is skipped after a halt. X-Robots-Tag and Content-Type must be set in before_all before calling halt.
  • []? for nil-safe header access — Crystal's nil safety: always use env.request.headers["User-Agent"]? (with ?), not [].
  • static_folder before filters — robots.txt is always served, regardless of bot status.
  • before_all "/api/*" — scoped glob filter. Combine with next for path exclusions.
  • Compiled binary — Crystal compiles to native code. The bot check runs at native speed with no interpreter overhead.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.