How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

Apache HTTP Server·9 min read

How to Block AI Bots on Apache: Complete 2026 Guide

Apache's module system gives you two clean paths to block AI crawlers: mod_rewrite (RewriteCond on HTTP_USER_AGENT) and mod_setenvif (BrowserMatch + Require). Both return a hard 403 before your application code runs. Header always set X-Robots-Tag from mod_headers handles the HTTP-layer training opt-out. And critically: always configure in VirtualHost, not .htaccess — .htaccess is re-parsed on every request.

Use VirtualHost, not .htaccess — unless on shared hosting

Apache reads and parses .htaccess on every single request — including bot requests you're trying to block. VirtualHost config in httpd.conf or /etc/apache2/sites-available/ is parsed once at startup. .htaccess also requires AllowOverride All, which disables a security layer. Only use .htaccess on shared hosting where you have no VirtualHost access.

Methods at a glance

Method	What it does	Module required
robots.txt in DocumentRoot	Signals bots which paths are off-limits	Built-in
mod_rewrite + RewriteCond	Hard 403 on known AI User-Agents	mod_rewrite
mod_setenvif + BrowserMatch	Set env var → Require env (modern)	mod_setenvif + mod_authz_core
Header always set X-Robots-Tag	noai header on all HTTP responses	mod_headers
noai <meta> tag	AI training opt-out per HTML page	Your app / HTML files
mod_evasive	Rate limit to catch UA-rotating bots	mod_evasive

1. robots.txt — DocumentRoot

Place robots.txt in your DocumentRoot directory (e.g. /var/www/html/). Apache serves it automatically. Add a Location block to suppress logging and disable .htaccess processing for this path:

# VirtualHost config
<VirtualHost *:443>
    ServerName example.com
    DocumentRoot /var/www/html

    # Clean handling for robots.txt
    <Location "/robots.txt">
        SetHandler default-handler
        Options None
        AllowOverride None
        Require all granted
    </Location>
</VirtualHost>

# /var/www/html/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: *
Allow: /

2. Hard 403 — mod_rewrite

mod_rewrite evaluates RewriteCond conditions before the RewriteRule. Match the HTTP_USER_AGENT server variable with a pipe-separated regex of known AI bot strings, then return a 403 with the [F] (Forbidden) flag. The [NC] flag makes the match case-insensitive.

# VirtualHost config (or .htaccess on shared hosting)
<VirtualHost *:443>
    ServerName example.com
    DocumentRoot /var/www/html

    <IfModule mod_rewrite.c>
        RewriteEngine On

        # Allow robots.txt through — checked first
        RewriteRule ^/robots.txt$ - [L]

        # Block known AI bots — [F] returns 403, [L] stops processing, [NC] case-insensitive
        RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai) [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} (CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider) [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} (YouBot|Applebot|DuckAssistBot|meta-externalagent|MistralAI-Spider) [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} (oai-searchbot) [NC]
        RewriteRule .* - [F,L]
    </IfModule>
</VirtualHost>

RewriteCond flag syntax

[NC] — case-insensitive match
[OR] — join conditions with OR (default is AND)
[F] — return 403 Forbidden immediately
[L] — last rule, stop processing
The final RewriteCond has no [OR] — that's intentional: it ANDs with the preceding OR chain, which always evaluates true when any prior condition matched.

3. mod_setenvif — BrowserMatch (Apache 2.4)

The modern Apache 2.4 approach uses BrowserMatch from mod_setenvif to set an environment variable, then denies access with Require not env from mod_authz_core. This avoids mod_rewrite entirely.

<VirtualHost *:443>
    ServerName example.com
    DocumentRoot /var/www/html

    # BrowserMatch sets an env variable when User-Agent matches
    # Each directive is case-insensitive by default
    BrowserMatch "GPTBot"              bad_bot
    BrowserMatch "ChatGPT-User"        bad_bot
    BrowserMatch "ClaudeBot"           bad_bot
    BrowserMatch "Claude-Web"          bad_bot
    BrowserMatch "anthropic-ai"        bad_bot
    BrowserMatch "CCBot"               bad_bot
    BrowserMatch "Google-Extended"     bad_bot
    BrowserMatch "PerplexityBot"       bad_bot
    BrowserMatch "Amazonbot"           bad_bot
    BrowserMatch "Bytespider"          bad_bot
    BrowserMatch "YouBot"              bad_bot
    BrowserMatch "DuckAssistBot"       bad_bot
    BrowserMatch "meta-externalagent"  bad_bot
    BrowserMatch "MistralAI-Spider"    bad_bot
    BrowserMatch "oai-searchbot"       bad_bot

    # Apply to all paths except robots.txt
    <LocationMatch "^(?!/robots.txt)">
        <RequireAll>
            Require all granted
            Require not env bad_bot
        </RequireAll>
    </LocationMatch>
</VirtualHost>

RequireAll + Require not env bad_bot returns 403 when the variable is set. This is the cleanest approach on Apache 2.4+.

4. X-Robots-Tag — mod_headers

The Header directive from mod_headers adds HTTP response headers. always is required — without it Apache only sends the header on 2xx responses. Enable the module first: sudo a2enmod headers.

<VirtualHost *:443>
    ServerName example.com

    # Add to ALL responses (2xx + 4xx + 5xx)
    # "always" is required — without it, header only sent on 2xx
    Header always set X-Robots-Tag "noai, noimageai"

    # Or scope it to HTML only (exclude API/JSON endpoints):
    <FilesMatch ".(html|htm)$">
        Header always set X-Robots-Tag "noai, noimageai"
    </FilesMatch>
</VirtualHost>

# Enable mod_headers on Debian/Ubuntu:
sudo a2enmod headers
sudo systemctl reload apache2

# Verify it's loaded:
apache2ctl -M | grep headers

5. noai meta tag — static HTML

Apache does not modify HTML content. For static sites served by Apache, add the noai meta tag directly in HTML or via your SSG base layout. For PHP, add it in your main layout file.

<!-- In your HTML <head> -->
<meta name="robots" content="noai, noimageai">

<!-- PHP layout (e.g. header.php): -->
<?php $robots = $robots ?? 'noai, noimageai'; ?>
<meta name="robots" content="<?= htmlspecialchars($robots) ?>">

<!-- WordPress: add to <head> in functions.php -->
add_action('wp_head', function() {
    echo '<meta name="robots" content="noai, noimageai">' . "
";
});

Use X-Robots-Tag (Section 4) as the HTTP-layer equivalent when you can't modify HTML files.

6. Rate limiting — mod_evasive

mod_evasive blocks IPs that exceed request thresholds — catching bots that rotate User-Agents to evade string matching. It's a DoS/scraping mitigation layer, not a replacement for User-Agent blocking.

# Install on Debian/Ubuntu:
sudo apt install libapache2-mod-evasive
sudo a2enmod evasive

# /etc/apache2/mods-available/evasive.conf
<IfModule mod_evasive20.c>
    # Requests to same page within DOSPageInterval → trigger block
    DOSPageCount        5
    DOSPageInterval     1

    # Total requests to site within DOSSiteInterval → trigger block
    DOSSiteCount        50
    DOSSiteInterval     1

    # How long (seconds) IP stays blocked
    DOSBlockingPeriod   10

    # Log blocked IPs
    DOSLogDir           /var/log/mod_evasive
    # DOSEmailNotify    admin@example.com
</IfModule>

# Create log dir with correct permissions:
sudo mkdir -p /var/log/mod_evasive
sudo chown www-data:www-data /var/log/mod_evasive
sudo systemctl reload apache2

7. Reverse proxy setup

Apache can front a Node, Python, or other backend with mod_proxy + mod_proxy_http. The bot check (mod_rewrite or BrowserMatch) fires before ProxyPass — blocked bots never reach your backend.

# Enable required modules:
# sudo a2enmod proxy proxy_http rewrite headers

<VirtualHost *:443>
    ServerName example.com

    ProxyPreserveHost On

    # Pass real client IP to backend
    RequestHeader set X-Forwarded-For "%{REMOTE_ADDR}e"
    RequestHeader set X-Forwarded-Proto "https"

    Header always set X-Robots-Tag "noai, noimageai"

    # robots.txt — serve locally, don't proxy
    Alias /robots.txt /var/www/html/robots.txt
    <Location "/robots.txt">
        SetHandler default-handler
        Require all granted
    </Location>

    <IfModule mod_rewrite.c>
        RewriteEngine On

        # Bot check fires BEFORE ProxyPass — blocked bots never hit backend
        RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai) [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} (CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider) [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} (YouBot|DuckAssistBot|meta-externalagent|MistralAI-Spider|oai-searchbot) [NC]
        RewriteRule .* - [F,L]

        # Proxy to backend
        RewriteRule ^/(.*) http://127.0.0.1:3000/$1 [P,L]
    </IfModule>

    # Or using ProxyPass directly (after the RewriteRule block):
    # ProxyPass / http://127.0.0.1:3000/
    # ProxyPassReverse / http://127.0.0.1:3000/
</VirtualHost>

8. Full VirtualHost example

A complete production VirtualHost combining robots.txt, mod_rewrite blocking, X-Robots-Tag, and SSL. Save to /etc/apache2/sites-available/example.com.conf.

# /etc/apache2/sites-available/example.com.conf

# HTTP → HTTPS redirect
<VirtualHost *:80>
    ServerName example.com
    Redirect permanent / https://example.com/
</VirtualHost>

# HTTPS VirtualHost
<VirtualHost *:443>
    ServerName example.com
    DocumentRoot /var/www/html/example.com

    # SSL
    SSLEngine on
    SSLCertificateFile    /etc/letsencrypt/live/example.com/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/example.com/privkey.pem

    # Security headers + AI training opt-out
    Header always set X-Robots-Tag "noai, noimageai"
    Header always set X-Content-Type-Options "nosniff"

    # robots.txt — clean, no logging
    <Location "/robots.txt">
        SetHandler default-handler
        Require all granted
    </Location>

    <Directory "/var/www/html/example.com">
        Options -Indexes +FollowSymLinks
        AllowOverride None          # Never AllowOverride All in production
        Require all granted

        <IfModule mod_rewrite.c>
            RewriteEngine On

            # Skip robots.txt
            RewriteRule ^robots.txt$ - [L]

            # Block AI bots
            RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|ClaudeBot|Claude-Web|anthropic-ai) [NC,OR]
            RewriteCond %{HTTP_USER_AGENT} (CCBot|Google-Extended|PerplexityBot|Amazonbot|Bytespider) [NC,OR]
            RewriteCond %{HTTP_USER_AGENT} (YouBot|Applebot|DuckAssistBot|meta-externalagent) [NC,OR]
            RewriteCond %{HTTP_USER_AGENT} (MistralAI-Spider|oai-searchbot) [NC]
            RewriteRule .* - [F,L]

            # SPA fallback (if serving a React/Vue/Angular app):
            RewriteCond %{REQUEST_FILENAME} !-f
            RewriteCond %{REQUEST_FILENAME} !-d
            RewriteRule ^ /index.html [L]
        </IfModule>
    </Directory>

    ErrorLog  ${APACHE_LOG_DIR}/example.com-error.log
    CustomLog ${APACHE_LOG_DIR}/example.com-access.log combined
</VirtualHost>

# Enable site and reload
sudo a2ensite example.com.conf
sudo a2enmod rewrite headers ssl
sudo apache2ctl configtest   # test before reload
sudo systemctl reload apache2

9. Docker deployment

The official httpd:alpine image is the standard choice. Bake your config into the image for immutable deployments, or volume-mount for easier iteration.

# Dockerfile
FROM httpd:alpine

# Copy custom config (httpd.conf) and webroot
COPY httpd.conf /usr/local/apache2/conf/httpd.conf
COPY dist/      /usr/local/apache2/htdocs/

EXPOSE 80
CMD ["httpd-foreground"]

# docker-compose.yml
services:
  apache:
    image: httpd:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./httpd.conf:/usr/local/apache2/conf/httpd.conf:ro
      - ./dist:/usr/local/apache2/htdocs:ro
    restart: unless-stopped

# Test config inside container:
docker exec apache-container httpd -t
docker exec apache-container httpd -k graceful

Frequently asked questions

Should I block AI bots in .htaccess or VirtualHost?

Always prefer VirtualHost config in httpd.conf or sites-available/. Apache reads and parses .htaccess on every request (including the bot requests you're trying to block), and requires AllowOverride All which is a security risk. Use .htaccess only on shared hosting where VirtualHost access isn't available.

What is the difference between mod_rewrite and mod_setenvif?

mod_setenvif (BrowserMatch) sets an environment variable and pairs with Require not env bad_bot — the cleaner Apache 2.4 approach with no regex flags to remember. mod_rewrite uses RewriteCond %{HTTP_USER_AGENT} + the [F] flag — more verbose but works in both VirtualHost and .htaccess, and handles the robots.txt exemption more explicitly.

How do I add X-Robots-Tag headers in Apache?

Header always set X-Robots-Tag "noai, noimageai" in VirtualHost, with mod_headers enabled (sudo a2enmod headers). The always keyword sends the header on all responses including 4xx/5xx. Without it, Apache only sends the header on 2xx responses.

How do I serve robots.txt from Apache?

Place it in your DocumentRoot — Apache serves it automatically. Add a <Location "/robots.txt"> block with SetHandler default-handler and AllowOverride None to prevent .htaccess interference and keep the config clean.

Does Apache bot blocking work as a reverse proxy?

Yes. With mod_proxy + mod_proxy_http, place the mod_rewrite or BrowserMatch bot check before the ProxyPass directive. Blocked bots receive 403 from Apache without the request reaching your backend. Add ProxyPreserveHost On and pass X-Forwarded-For via mod_headers so your backend sees the real client IP.

What is mod_evasive and should I use it?

mod_evasive blocks IPs that exceed request thresholds (DOSPageCount per URL, DOSSiteCount per site). It catches bots that rotate User-Agents to evade string matching. It complements mod_rewrite/BrowserMatch — use both. Install with sudo apt install libapache2-mod-evasive && sudo a2enmod evasive.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.