Skip to content
MagentoAdobe CommerceNew9 min read

How to Block AI Bots on Magento

Magento and Adobe Commerce stores are prime AI scraping targets — product descriptions, pricing, and customer reviews are exactly the data AI companies want. Here's how to lock out 25+ AI crawlers using the built-in Admin panel, layout XML, and server-level rules.

Why e-commerce stores are high-priority AI targets

AI companies and data brokers don't just want your blog posts — they want your product catalog. Diffbot crawls Magento stores and sells structured product data (names, descriptions, prices, SKUs, reviews) to AI labs and hedge funds for competitive intelligence. CCBot feeds your product copy into 50+ LLM training datasets. GPTBot uses your store content to train ChatGPT on e-commerce language. Your product pages are doing unpaid work for your competitors' AI tools every single day.

Magento 2.4.x

  • ✓ Admin panel robots.txt editor
  • ✓ Physical robots.txt at root
  • ✓ noai tag via layout XML
  • ✓ pub/.htaccess blocking
  • ✓ Cloudflare WAF

Adobe Commerce Cloud

  • ✓ Admin panel robots.txt editor
  • ✓ pub/robots.txt via git
  • ✓ .magento.app.yaml nginx rules
  • ✓ Fastly WAF (built-in)
  • ✓ Cloudflare WAF

Magento 2.3.x (legacy)

  • ✓ Admin panel robots.txt editor
  • ✓ Physical robots.txt at root
  • ✓ noai tag via layout XML
  • ✓ pub/.htaccess blocking
  • ⚠️ Upgrade recommended

Quick fix — add to your Magento robots.txt

Paste into Admin → Content → Design → Configuration → Search Engines Robots → Custom Instructions field. Or add to the physical robots.txt in your Magento root.

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: Google-Extended
Disallow: /

Available Methods

Admin Panel robots.txt (Recommended)

Easy

Content → Design → Configuration → Edit (your store) → Search Engines Robots

Magento 2.x includes a built-in robots.txt editor in the Admin panel. Add AI bot Disallow rules in the Custom Instructions field — no FTP or SSH required.

Available in Magento 2.x and Adobe Commerce (on-prem and Cloud). Changes clear the full-page cache automatically.

Physical robots.txt file

Easy

[magento-root]/robots.txt (or pub/robots.txt on Cloud)

A physical robots.txt in the Magento root will be served in addition to or instead of Admin settings. Edit via FTP/SSH. On Cloud, commit to the pub/ directory.

On some Magento versions, Admin-generated content takes precedence. Test with curl -I https://yoursite.com/robots.txt after editing.

noai meta tag via theme layout XML

Medium

app/design/frontend/[Vendor]/[theme]/Magento_Theme/layout/default_head_blocks.xml

Add the noai/noimageai meta tag to every Magento page via layout XML in your theme. No coding, just XML. Run bin/magento cache:flush after saving.

Tells AI bots not to use your content even when they do visit. Respecting noai is voluntary — combine with robots.txt blocking.

.htaccess or nginx user-agent blocking

Intermediate

pub/.htaccess (Apache) or nginx.conf / .magento.app.yaml (Cloud)

Block AI crawlers at the server level before they reach the PHP application. The only reliable method for bots that ignore robots.txt, like Bytespider.

Magento ships with a complex pub/.htaccess — add bot rules at the top, before Magento's own directives.

Cloudflare WAF / Fastly WAF

Intermediate

Cloudflare Dashboard → Security → WAF → Custom Rules (or Fastly for Cloud)

Edge-level blocking — rejects bot requests before they reach your Magento server. Handles IP-based evasion and UA spoofing. Fastly WAF is built into Adobe Commerce Cloud.

Most effective method. Adobe Commerce Cloud customers: use Fastly WAF custom rules (included in your plan).

Method 1: Admin Panel robots.txt (Easiest)

Magento 2.x includes a built-in robots.txt editor — no FTP, no SSH needed. Go to Content → Design → Configuration, click Edit next to your store view, then expand the Search Engines Robots section.

  1. 1

    Log in to your Magento Admin. Go to Content → Design → Configuration.

  2. 2

    Click Edit next to your store view (or Global scope). Scroll down to the Search Engines Robots section and expand it.

  3. 3

    In the Custom instructions of robots.txt File field, paste the full AI bot block list below. Click Save Configuration.

  4. 4

    The cache clears automatically. Verify by visiting https://yourstore.com/robots.txt — your new rules should appear at the bottom.

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: webzio-extended
Disallow: /

User-agent: gemini-deep-research
Disallow: /

Method 2: Physical robots.txt File

You can also create or edit a physical robots.txt file at the Magento root (the same directory as index.php and composer.json). On Adobe Commerce Cloud, place the file in the pub/ directory and commit it to your git repository.

Admin vs. physical file: On some Magento 2 versions, the Admin-generated content is served as the complete robots.txt — a physical file in the root may be ignored. Test by checking which content appears at /robots.txt after saving. On Magento Cloud, a physical file in pub/robots.txt committed to git takes precedence.

For Adobe Commerce Cloud — commit robots.txt to your repository:

# In your project root:
# Add the full AI bot block list to pub/robots.txt
# Then commit and push:
git add pub/robots.txt
git commit -m "Block AI training bots via robots.txt"
git push origin main

Method 3: noai Meta Tag via Layout XML

The noai meta tag tells AI crawlers not to use your content for training, even if they do visit. Add it to every Magento page via your theme's layout XML — no PHP code required.

  1. 1

    Navigate to your theme directory: app/design/frontend/[Vendor]/[theme]/Magento_Theme/layout/

  2. 2

    Create or open default_head_blocks.xml. Add the noai meta tag inside the <head> block:

<?xml version="1.0"?>
<page xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="urn:magento:framework:View/Layout/etc/page_configuration.xsd">
    <head>
        <meta name="robots" content="noai, noimageai"/>
    </head>
</page>
  1. 3

    Flush the Magento cache:

    bin/magento cache:flush
  2. 4

    Verify by visiting any product page and checking the page source for <meta name="robots" content="noai, noimageai"> inside the <head>.

Note on noai compliance: The noai meta tag is voluntary — AI companies that respect it include OpenAI, Anthropic, and Google. Bytespider, Diffbot, and some data brokers may ignore it. Use noai as a belt-and-suspenders addition to robots.txt blocking, not a replacement.

Method 4: .htaccess / nginx Blocking

For Apache installs, Magento ships with a pub/.htaccess file. Add user-agent blocking rules at the very top — before Magento's own directives — so matched bots get a 403 before any PHP is executed.

Apache (.htaccess) — add at the top of pub/.htaccess:

# ── AI Bot Blocking ──────────────────────────────────
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|DeepSeekBot|MistralBot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research|xAI-Bot|Applebot-Extended|Amazonbot) [NC]
    RewriteRule .* - [F,L]
</IfModule>
# ── End AI Bot Blocking ──────────────────────────────

nginx — add inside your server block:

# AI Bot Blocking
if ($http_user_agent ~* "(GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|DeepSeekBot|MistralBot|cohere-ai|AI2Bot|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research|xAI-Bot|Applebot-Extended|Amazonbot)") {
    return 403;
}

Adobe Commerce Cloud — add to .magento.app.yaml:

web:
  locations:
    "/":
      rules:
        "^/(.*)" :
          allow: true
          headers:
            Vary: Accept-Encoding
          expires: -1
        # AI bot blocking — applies before Magento handles the request
        "^/(.*)$":
          allow: false
          headers:
            status: 403
          passthru: "/index.php"
# Note: For more precise UA matching on Commerce Cloud,
# use Fastly custom VCL or WAF rules (recommended over .magento.app.yaml)

Method 5: Cloudflare WAF / Fastly WAF

Edge-level blocking is the most reliable method — it stops bot requests before they even reach your Magento server. Fastly WAF is included with Adobe Commerce Cloud plans at no extra cost. For Magento Open Source on-prem, Cloudflare's free plan handles basic user-agent rules.

Cloudflare WAF (Magento Open Source)

  1. 1. Proxy your domain through Cloudflare (orange cloud ☁️ in DNS).
  2. 2. Go to Security → WAF → Custom Rules → Create rule.
  3. 3. Set: http.user_agent contains "GPTBot" OR contains "ClaudeBot" etc.
  4. 4. Action: Block. Deploy.
  5. 5. Free plan supports basic string rules. Pro adds regex + rate limiting.

Fastly WAF (Adobe Commerce Cloud)

  1. 1. Log in to Fastly (linked in your Cloud Console).
  2. 2. Go to Security → WAF → Custom Rules.
  3. 3. Create a rule matching req.http.User-Agent against AI bot patterns.
  4. 4. Set action: Block (403).
  5. 5. Fastly WAF handles Bytespider's IP-based evasion better than robots.txt.
Cloudflare rule expression for major AI bots:
(http.user_agent contains "GPTBot") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "CCBot") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Diffbot") or (http.user_agent contains "meta-externalagent") or (http.user_agent contains "DeepSeekBot")

Which AI Bots Affect Magento Stores Most

BotCompanyMagento RelevancePriority
DiffbotDiffbotSells structured product data, pricing, reviews to AI companies and hedge fundsCritical — data broker
CCBotCommon CrawlCrawls product pages and descriptions for 50+ LLM training datasetsHigh — blocks 50+ AI models
GPTBotOpenAICrawls store for GPT-4 / ChatGPT trainingHigh
ClaudeBotAnthropicCrawls store for Claude model trainingHigh
Google-ExtendedGoogleTrains Gemini on your product & category contentHigh
BytespiderByteDanceKnown robots.txt violator — blocks required at server levelCritical — use WAF
meta-externalagentMetaTrains Llama models on product descriptions and reviewsHigh
DeepSeekBotDeepSeekChinese AI lab crawler, outside GDPR/US jurisdictionHigh
xAI-BotxAI (Grok)Crawls e-commerce content and product dataMedium
PerplexityBotPerplexity AIAI search crawler — blocking removes you from Perplexity answersYour call

Full AI Bot Reference

All 25 AI bots covered by the robots.txt block list above:

GPTBotChatGPT-UserOAI-SearchBotClaudeBotanthropic-aiGoogle-ExtendedBytespiderCCBotPerplexityBotmeta-externalagentAmazonbotApplebot-ExtendedxAI-BotDeepSeekBotMistralBotDiffbotcohere-aiAI2BotAi2Bot-DolmaYouBotDuckAssistBotomgiliomgilibotwebzio-extendedgemini-deep-research

Frequently Asked Questions

Where is the robots.txt file in Magento 2?

Magento 2 generates robots.txt dynamically based on settings in the Admin panel. There is also a physical file at the Magento root: [magento-root]/robots.txt (the same directory as index.php and composer.json). On Magento Cloud / Adobe Commerce Cloud, the robots.txt is served from the pub/ directory. The Admin panel editor (Content → Design → Configuration → Search Engines Robots) lets you add custom instructions that get appended to the generated file — this is the easiest way to add AI bot blocking rules.

How do I add a noai meta tag to every Magento page?

The cleanest method is via your theme's layout XML. Create or edit app/design/frontend/[Vendor]/[theme]/Magento_Theme/layout/default_head_blocks.xml and add a <meta name="robots" content="noai, noimageai"> element inside the <head> block. After saving, run bin/magento cache:flush to apply changes. Alternatively, create a small custom module that uses a layout XML observer to inject the meta tag globally.

Will blocking AI bots affect Magento SEO?

No. Blocking GPTBot, ClaudeBot, CCBot, Google-Extended, Diffbot, and other AI training/scraping bots has zero effect on Googlebot or Bingbot. Your Magento product pages, category pages, and blog content will continue to rank normally in Google and Bing search results. Magento's native sitemap generation, canonical URLs, and meta robots settings are completely unaffected.

Why is Diffbot especially dangerous for Magento stores?

Diffbot is a commercial data broker that crawls e-commerce stores and sells structured product data — names, descriptions, prices, SKUs, reviews — to AI companies and hedge funds. For Magento stores, this means your entire product catalog, pricing strategy, and customer reviews can end up in AI training datasets and competitor intelligence databases without your knowledge. Unlike GPTBot, which is blocked by robots.txt, Diffbot has historically been less reliable about honoring robots.txt on commercial stores.

How do I block AI bots on Adobe Commerce Cloud (Magento Cloud)?

On Adobe Commerce Cloud, you have several options: (1) Use the Admin panel robots.txt editor — this works the same as on-prem; (2) Add a physical robots.txt to your project's pub/ directory and commit it to git — this file will override the Admin-generated one; (3) Add nginx user-agent blocking rules to your .magento.app.yaml configuration using the 'web.locations' property; (4) Use Fastly WAF (included with Commerce Cloud) to block bots at the CDN edge. The Fastly route is most effective for stopping bots like Bytespider that ignore robots.txt.

Does Magento have a built-in way to block specific user agents?

Magento does not have a built-in UI for blocking specific user agents (other than through robots.txt). For server-level blocking, use .htaccess (Apache installs ship with a .htaccess in the pub/ directory) or nginx location blocks. For Adobe Commerce Cloud with Fastly, create a custom WAF rule to block by user agent — this is more reliable than robots.txt because it rejects requests before they reach the PHP application layer.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides