Skip to content
DrupalNew9 min read

How to Block AI Bots on Drupal

Drupal powers government agencies, universities, and major publishers — all prime targets for AI training crawlers. Four methods: direct robots.txt editing, the robotstxt module, noai meta tags, and .htaccess server blocking.

Self-Hosted Drupal

  • ✓ Edit robots.txt static file directly
  • ✓ noai meta tag via Metatag module
  • ✓ noai tag via html.html.twig template
  • ✓ .htaccess server-level blocking
  • ✓ Cloudflare WAF

Acquia / Pantheon / Platform.sh

  • ✓ Edit robots.txt in Git repo root
  • ✓ robotstxt module (UI-based, no SSH)
  • ✓ Metatag module for noai tags
  • ✓ Cloudflare WAF (any plan)
  • ✗ Direct .htaccess changes (platform-managed)

Quick fix — add to your Drupal robots.txt

File is at the Drupal root (same folder as index.php). Edit directly or via the robotstxt module.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Method 1: Edit the robots.txt File Directly

Unlike WordPress, Drupal ships with a real static robots.txt file in the document root. You can edit it directly — no plugin or module needed. This is the fastest approach for any Drupal install where you have file system or Git access.

Where to find it: The robots.txt file is in the Drupal root directory — the same folder as index.php, .htaccess, and composer.json. On a typical Linux server: /var/www/html/robots.txt. On Acquia/Pantheon: the repo root.
  1. 1

    Open the robots.txt file in your Drupal root (SSH, SFTP, or Git):

    nano /var/www/html/robots.txt
    # Or on Pantheon/Acquia: edit in your repo and commit
  2. 2

    Below the existing User-agent: * block, add the AI bot rules:

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

Sitemap: https://yourdomain.com/sitemap.xml
  1. 3

    Replace yourdomain.com with your actual domain. Save the file. For Git-based hosts, commit and push.

  2. 4

    Verify: visit https://yourdomain.com/robots.txt — confirm your new rules appear.

Drupal core updates may overwrite robots.txt: When you run composer update drupal/core, Drupal's core robots.txt can be overwritten depending on your scaffold settings. Check your composer.json under "drupal-scaffold""file-mapping". Add "[web-root]/robots.txt": false to prevent scaffolding from overwriting your customized file. Or use the robotstxt module (Method 2) — it's immune to this issue.

Method 2: robotstxt Module (No SSH Required)

The robotstxt contrib module replaces the static robots.txt file with a database-driven version you can edit from the Drupal admin UI. Ideal for managed hosting environments where direct file editing is inconvenient, or for teams that manage content through the admin interface.

  1. 1

    Install the module via Composer:

    composer require drupal/robotstxt
    drush en robotstxt
    drush cr
  2. 2

    In Drupal Admin, go to Configuration → Search and Metadata → robots.txt.

  3. 3

    In the textarea, add the AI bot Disallow rules from Method 1 above. Click Save configuration.

  4. 4

    Verify at https://yourdomain.com/robots.txt. The module intercepts this path before the static file is served.

Module vs static file: With the robotstxt module active, the static robots.txt file in the Drupal root is ignored. The module serves its database content instead. If you later uninstall the module, Drupal falls back to the static file — so keep them in sync or delete the static file.

Method 3: noai Meta Tag (All Plans)

The robots.txt rules cover cooperative bots. The noai meta tag adds a second layer — it instructs bots that check page-level signals. There are two ways to add it to Drupal.

Option A: Metatag Module (recommended)

  1. 1

    Install the Metatag module if not already installed:

    composer require drupal/metatag
    drush en metatag
    drush cr
  2. 2

    Go to Admin → Configuration → Search and Metadata → Metatag.

  3. 3

    Click Global (or the relevant content type). Find the Robots field. Add noai, noimageai to the existing values.

  4. 4

    Click Save. Run drush cr to clear caches. Verify in page source — search for noai.

Option B: html.html.twig Template (direct theme edit)

If you prefer not to install a module, add the tag directly to your active theme's base template:

  1. 1

    Find your active theme's html.html.twig — typically at:

    /web/themes/custom/YOUR_THEME/templates/layout/html.html.twig

    If the file doesn't exist in your custom theme, copy it from core/modules/system/templates/html.html.twig.

  2. 2

    Find the {{ head }} placeholder. Add the meta tag immediately after it:

    {{ head }}
    <meta name="robots" content="noai, noimageai">
  3. 3

    Clear Drupal's template cache: drush cr or Admin → Reports → Flush all caches.

Metatag vs template: The Metatag module is preferred — it stores values in config, survives theme changes, and lets non-developers manage meta values from the admin UI. The twig template approach is faster but breaks if you switch themes.

Method 4: .htaccess Server Blocking (Apache)

For Apache-hosted Drupal installs, you can block AI bots at the server level via .htaccess. This is harder to bypass than robots.txt — the request is terminated before PHP or Drupal processes it.

Self-hosted Apache only

Open your Drupal .htaccess (in the Drupal root). Add this block near the top, before the RewriteEngine On line:

# Block AI training crawlers
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ClaudeBot|anthropic-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Google-Extended|Bytespider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (CCBot|PerplexityBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (meta-externalagent|Amazonbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Applebot-Extended|xAI-Bot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DeepSeekBot|MistralBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Diffbot|cohere-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AI2Bot|DuckAssistBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (omgilibot|webzio-extended|gemini-deep-research) [NC]
RewriteRule ^ - [F,L]
</IfModule>
Placement matters: Add this block at the very top of .htaccess, before Drupal's existing RewriteEngine On statement. Drupal's .htaccess already has a RewriteEngine directive — having two can cause a conflict. Either add your rules before Drupal's block, or merge them carefully into Drupal's existing rewrite section.
For nginx: Add the following to your Drupal server block (inside server {}):
if ($http_user_agent ~* "(GPTBot|ClaudeBot|CCBot|Google-Extended|Bytespider|PerplexityBot|anthropic-ai|meta-externalagent|Amazonbot|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|DuckAssistBot|omgilibot|webzio-extended|gemini-deep-research)") {
    return 403;
}
Reload nginx: nginx -t && systemctl reload nginx

Method 5: Cloudflare WAF (All Hosting)

Works for all Drupal hosting environments — self-hosted, Acquia, Pantheon, Platform.sh. Blocks bots at Cloudflare's edge before they reach your Drupal server. Highly effective against bots that ignore robots.txt (Bytespider in particular).

  1. 1Add your domain to Cloudflare (free plan) and update your DNS nameservers.
  2. 2Go to Security → WAF → Custom Rules → Create rule.
  3. 3Click Edit expression and paste:
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "anthropic-ai") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "meta-externalagent") or
(http.user_agent contains "DeepSeekBot") or
(http.user_agent contains "MistralBot") or
(http.user_agent contains "xAI-Bot") or
(http.user_agent contains "Diffbot") or
(http.user_agent contains "cohere-ai") or
(http.user_agent contains "AI2Bot") or
(http.user_agent contains "DuckAssistBot") or
(http.user_agent contains "omgilibot") or
(http.user_agent contains "webzio-extended") or
(http.user_agent contains "gemini-deep-research")

Set action to Block. Drupal never processes the request.

All 25 AI Bots to Block

User agents for the robots.txt rules, .htaccess, and Cloudflare WAF:

GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
anthropic-ai
Google-Extended
Bytespider
CCBot
PerplexityBot
meta-externalagent
Amazonbot
Applebot-Extended
xAI-Bot
DeepSeekBot
MistralBot
Diffbot
cohere-ai
AI2Bot
Ai2Bot-Dolma
YouBot
DuckAssistBot
omgili
omgilibot
webzio-extended
gemini-deep-research

Why Drupal sites are prime AI training targets

Drupal powers the White House, NASA, Tesla, BBC, Harvard, and hundreds of government agencies. Its structured content model, clean semantic HTML, and taxonomy-rich pages make it exceptionally valuable training data. Drupal's Views module produces consistently formatted list pages — ideal for scraping at scale. CCBot (which feeds GPT, Gemini, Llama, and most open-source models) and Diffbot (sold to AI companies) both actively prioritise high-authority Drupal sites. If your Drupal site has been live for more than a year, it's almost certainly in multiple training datasets already.

Will This Affect Drupal SEO?

Safe to block

  • ✓ Google Search rankings unaffected
  • ✓ Bing rankings unaffected
  • ✓ Drupal Sitemap module unaffected
  • ✓ JSON-LD structured data unaffected
  • ✓ Metatag SEO fields unaffected
  • ✓ Pathauto URL paths unaffected

Consider before blocking

  • ⚠ OAI-SearchBot → removes from ChatGPT Search
  • ⚠ PerplexityBot → removes from Perplexity citations
  • ⚠ DuckAssistBot → removes from Duck.ai answers
  • For government and institutional Drupal sites, full AI search visibility may be desirable. Use the per-path approach in robots.txt to block training bots while allowing AI search bots on public content.

Protecting robots.txt from Composer Scaffold

When you run composer update drupal/core, the Drupal scaffold process may overwrite your robots.txt with the default version. Prevent this by adding a file-mapping exclusion to your composer.json:

{
  "extra": {
    "drupal-scaffold": {
      "file-mapping": {
        "[web-root]/robots.txt": false
      }
    }
  }
}

Setting the path to false tells Composer scaffold to skip that file entirely. Your custom robots.txt will survive all future composer update runs. Alternatively, use the robotstxt module — it's completely immune to scaffold changes since it doesn't use the static file.

Frequently Asked Questions

Where is the robots.txt file in Drupal?

In the Drupal root directory — the same folder as index.php and .htaccess. On a typical self-hosted install: /var/www/html/robots.txt. On Acquia or Pantheon: the repository root. Edit it directly, or install the robotstxt contrib module for admin UI management.

How do I add a noai meta tag to every Drupal page?

Two methods: (1) Metatag module — Admin → Configuration → Search and Metadata → Metatag → Global → Robots field → add 'noai, noimageai'. (2) html.html.twig template — add <meta name="robots" content="noai, noimageai"> after the {{ head }} placeholder in your theme's layout template. The Metatag module is preferred — it survives theme changes.

Can I block AI bots without SSH or file system access?

Yes. Install the robotstxt module (drupal.org/project/robotstxt) — it replaces the static file with a UI-editable, database-stored version. For meta tags, use the Metatag module. Both are admin-only changes with no file system access needed.

Will Composer update overwrite my robots.txt?

Potentially — Drupal's scaffold process can overwrite the robots.txt file when you update core. Prevent this by adding "[web-root]/robots.txt": false to the drupal-scaffold file-mapping in your composer.json. Or use the robotstxt module, which manages robots.txt via the database and is completely unaffected by scaffold.

How do I block AI bots on Acquia or Pantheon?

Edit the robots.txt file in your Git repository root and commit the change — it deploys automatically. For meta tags, use the Metatag module (admin config, no Git commit needed). For edge blocking, proxy your domain through Cloudflare and add a WAF rule. Direct .htaccess changes for bot blocking may conflict with Acquia/Pantheon managed configs — test in a non-prod environment first.

Will blocking AI bots affect Drupal's Google Search rankings?

No. Blocking GPTBot, ClaudeBot, CCBot, Google-Extended, and other AI training bots has zero effect on Googlebot or Bingbot. Your Drupal site's Search API, sitemap, Pathauto paths, and structured data all continue working normally. Google Search rankings and Bing rankings are completely unaffected.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.