How to Block AI Bots on Magento
Magento and Adobe Commerce stores are prime AI scraping targets — product descriptions, pricing, and customer reviews are exactly the data AI companies want. Here's how to lock out 25+ AI crawlers using the built-in Admin panel, layout XML, and server-level rules.
Why e-commerce stores are high-priority AI targets
AI companies and data brokers don't just want your blog posts — they want your product catalog. Diffbot crawls Magento stores and sells structured product data (names, descriptions, prices, SKUs, reviews) to AI labs and hedge funds for competitive intelligence. CCBot feeds your product copy into 50+ LLM training datasets. GPTBot uses your store content to train ChatGPT on e-commerce language. Your product pages are doing unpaid work for your competitors' AI tools every single day.
Magento 2.4.x
- ✓ Admin panel robots.txt editor
- ✓ Physical robots.txt at root
- ✓ noai tag via layout XML
- ✓ pub/.htaccess blocking
- ✓ Cloudflare WAF
Adobe Commerce Cloud
- ✓ Admin panel robots.txt editor
- ✓ pub/robots.txt via git
- ✓ .magento.app.yaml nginx rules
- ✓ Fastly WAF (built-in)
- ✓ Cloudflare WAF
Magento 2.3.x (legacy)
- ✓ Admin panel robots.txt editor
- ✓ Physical robots.txt at root
- ✓ noai tag via layout XML
- ✓ pub/.htaccess blocking
- ⚠️ Upgrade recommended
Quick fix — add to your Magento robots.txt
Paste into Admin → Content → Design → Configuration → Search Engines Robots → Custom Instructions field. Or add to the physical robots.txt in your Magento root.
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: Diffbot Disallow: / User-agent: Google-Extended Disallow: /
Available Methods
Admin Panel robots.txt (Recommended)
EasyContent → Design → Configuration → Edit (your store) → Search Engines Robots
Magento 2.x includes a built-in robots.txt editor in the Admin panel. Add AI bot Disallow rules in the Custom Instructions field — no FTP or SSH required.
Available in Magento 2.x and Adobe Commerce (on-prem and Cloud). Changes clear the full-page cache automatically.
Physical robots.txt file
Easy[magento-root]/robots.txt (or pub/robots.txt on Cloud)
A physical robots.txt in the Magento root will be served in addition to or instead of Admin settings. Edit via FTP/SSH. On Cloud, commit to the pub/ directory.
On some Magento versions, Admin-generated content takes precedence. Test with curl -I https://yoursite.com/robots.txt after editing.
noai meta tag via theme layout XML
Mediumapp/design/frontend/[Vendor]/[theme]/Magento_Theme/layout/default_head_blocks.xml
Add the noai/noimageai meta tag to every Magento page via layout XML in your theme. No coding, just XML. Run bin/magento cache:flush after saving.
Tells AI bots not to use your content even when they do visit. Respecting noai is voluntary — combine with robots.txt blocking.
.htaccess or nginx user-agent blocking
Intermediatepub/.htaccess (Apache) or nginx.conf / .magento.app.yaml (Cloud)
Block AI crawlers at the server level before they reach the PHP application. The only reliable method for bots that ignore robots.txt, like Bytespider.
Magento ships with a complex pub/.htaccess — add bot rules at the top, before Magento's own directives.
Cloudflare WAF / Fastly WAF
IntermediateCloudflare Dashboard → Security → WAF → Custom Rules (or Fastly for Cloud)
Edge-level blocking — rejects bot requests before they reach your Magento server. Handles IP-based evasion and UA spoofing. Fastly WAF is built into Adobe Commerce Cloud.
Most effective method. Adobe Commerce Cloud customers: use Fastly WAF custom rules (included in your plan).
Method 1: Admin Panel robots.txt (Easiest)
Magento 2.x includes a built-in robots.txt editor — no FTP, no SSH needed. Go to Content → Design → Configuration, click Edit next to your store view, then expand the Search Engines Robots section.
- 1
Log in to your Magento Admin. Go to
Content → Design → Configuration. - 2
Click Edit next to your store view (or Global scope). Scroll down to the Search Engines Robots section and expand it.
- 3
In the Custom instructions of robots.txt File field, paste the full AI bot block list below. Click Save Configuration.
- 4
The cache clears automatically. Verify by visiting
https://yourstore.com/robots.txt— your new rules should appear at the bottom.
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: /
Method 2: Physical robots.txt File
You can also create or edit a physical robots.txt file at the Magento root (the same directory as index.php and composer.json). On Adobe Commerce Cloud, place the file in the pub/ directory and commit it to your git repository.
/robots.txt after saving. On Magento Cloud, a physical file in pub/robots.txt committed to git takes precedence.For Adobe Commerce Cloud — commit robots.txt to your repository:
# In your project root: # Add the full AI bot block list to pub/robots.txt # Then commit and push: git add pub/robots.txt git commit -m "Block AI training bots via robots.txt" git push origin main
Method 3: noai Meta Tag via Layout XML
The noai meta tag tells AI crawlers not to use your content for training, even if they do visit. Add it to every Magento page via your theme's layout XML — no PHP code required.
- 1
Navigate to your theme directory:
app/design/frontend/[Vendor]/[theme]/Magento_Theme/layout/ - 2
Create or open
default_head_blocks.xml. Add the noai meta tag inside the<head>block:
<?xml version="1.0"?>
<page xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="urn:magento:framework:View/Layout/etc/page_configuration.xsd">
<head>
<meta name="robots" content="noai, noimageai"/>
</head>
</page>- 3
Flush the Magento cache:
bin/magento cache:flush
- 4
Verify by visiting any product page and checking the page source for
<meta name="robots" content="noai, noimageai">inside the<head>.
Method 4: .htaccess / nginx Blocking
For Apache installs, Magento ships with a pub/.htaccess file. Add user-agent blocking rules at the very top — before Magento's own directives — so matched bots get a 403 before any PHP is executed.
Apache (.htaccess) — add at the top of pub/.htaccess:
# ── AI Bot Blocking ──────────────────────────────────
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|DeepSeekBot|MistralBot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research|xAI-Bot|Applebot-Extended|Amazonbot) [NC]
RewriteRule .* - [F,L]
</IfModule>
# ── End AI Bot Blocking ──────────────────────────────nginx — add inside your server block:
# AI Bot Blocking
if ($http_user_agent ~* "(GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|DeepSeekBot|MistralBot|cohere-ai|AI2Bot|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research|xAI-Bot|Applebot-Extended|Amazonbot)") {
return 403;
}Adobe Commerce Cloud — add to .magento.app.yaml:
web:
locations:
"/":
rules:
"^/(.*)" :
allow: true
headers:
Vary: Accept-Encoding
expires: -1
# AI bot blocking — applies before Magento handles the request
"^/(.*)$":
allow: false
headers:
status: 403
passthru: "/index.php"
# Note: For more precise UA matching on Commerce Cloud,
# use Fastly custom VCL or WAF rules (recommended over .magento.app.yaml)Method 5: Cloudflare WAF / Fastly WAF
Edge-level blocking is the most reliable method — it stops bot requests before they even reach your Magento server. Fastly WAF is included with Adobe Commerce Cloud plans at no extra cost. For Magento Open Source on-prem, Cloudflare's free plan handles basic user-agent rules.
Cloudflare WAF (Magento Open Source)
- 1. Proxy your domain through Cloudflare (orange cloud ☁️ in DNS).
- 2. Go to Security → WAF → Custom Rules → Create rule.
- 3. Set:
http.user_agent contains "GPTBot"ORcontains "ClaudeBot"etc. - 4. Action: Block. Deploy.
- 5. Free plan supports basic string rules. Pro adds regex + rate limiting.
Fastly WAF (Adobe Commerce Cloud)
- 1. Log in to Fastly (linked in your Cloud Console).
- 2. Go to Security → WAF → Custom Rules.
- 3. Create a rule matching
req.http.User-Agentagainst AI bot patterns. - 4. Set action: Block (403).
- 5. Fastly WAF handles Bytespider's IP-based evasion better than robots.txt.
(http.user_agent contains "GPTBot") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "CCBot") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Diffbot") or (http.user_agent contains "meta-externalagent") or (http.user_agent contains "DeepSeekBot")
Which AI Bots Affect Magento Stores Most
| Bot | Company | Magento Relevance | Priority |
|---|---|---|---|
| Diffbot | Diffbot | Sells structured product data, pricing, reviews to AI companies and hedge funds | Critical — data broker |
| CCBot | Common Crawl | Crawls product pages and descriptions for 50+ LLM training datasets | High — blocks 50+ AI models |
| GPTBot | OpenAI | Crawls store for GPT-4 / ChatGPT training | High |
| ClaudeBot | Anthropic | Crawls store for Claude model training | High |
| Google-Extended | Trains Gemini on your product & category content | High | |
| Bytespider | ByteDance | Known robots.txt violator — blocks required at server level | Critical — use WAF |
| meta-externalagent | Meta | Trains Llama models on product descriptions and reviews | High |
| DeepSeekBot | DeepSeek | Chinese AI lab crawler, outside GDPR/US jurisdiction | High |
| xAI-Bot | xAI (Grok) | Crawls e-commerce content and product data | Medium |
| PerplexityBot | Perplexity AI | AI search crawler — blocking removes you from Perplexity answers | Your call |
Full AI Bot Reference
All 25 AI bots covered by the robots.txt block list above:
Frequently Asked Questions
Where is the robots.txt file in Magento 2?↓
Magento 2 generates robots.txt dynamically based on settings in the Admin panel. There is also a physical file at the Magento root: [magento-root]/robots.txt (the same directory as index.php and composer.json). On Magento Cloud / Adobe Commerce Cloud, the robots.txt is served from the pub/ directory. The Admin panel editor (Content → Design → Configuration → Search Engines Robots) lets you add custom instructions that get appended to the generated file — this is the easiest way to add AI bot blocking rules.
How do I add a noai meta tag to every Magento page?↓
The cleanest method is via your theme's layout XML. Create or edit app/design/frontend/[Vendor]/[theme]/Magento_Theme/layout/default_head_blocks.xml and add a <meta name="robots" content="noai, noimageai"> element inside the <head> block. After saving, run bin/magento cache:flush to apply changes. Alternatively, create a small custom module that uses a layout XML observer to inject the meta tag globally.
Will blocking AI bots affect Magento SEO?↓
No. Blocking GPTBot, ClaudeBot, CCBot, Google-Extended, Diffbot, and other AI training/scraping bots has zero effect on Googlebot or Bingbot. Your Magento product pages, category pages, and blog content will continue to rank normally in Google and Bing search results. Magento's native sitemap generation, canonical URLs, and meta robots settings are completely unaffected.
Why is Diffbot especially dangerous for Magento stores?↓
Diffbot is a commercial data broker that crawls e-commerce stores and sells structured product data — names, descriptions, prices, SKUs, reviews — to AI companies and hedge funds. For Magento stores, this means your entire product catalog, pricing strategy, and customer reviews can end up in AI training datasets and competitor intelligence databases without your knowledge. Unlike GPTBot, which is blocked by robots.txt, Diffbot has historically been less reliable about honoring robots.txt on commercial stores.
How do I block AI bots on Adobe Commerce Cloud (Magento Cloud)?↓
On Adobe Commerce Cloud, you have several options: (1) Use the Admin panel robots.txt editor — this works the same as on-prem; (2) Add a physical robots.txt to your project's pub/ directory and commit it to git — this file will override the Admin-generated one; (3) Add nginx user-agent blocking rules to your .magento.app.yaml configuration using the 'web.locations' property; (4) Use Fastly WAF (included with Commerce Cloud) to block bots at the CDN edge. The Fastly route is most effective for stopping bots like Bytespider that ignore robots.txt.
Does Magento have a built-in way to block specific user agents?↓
Magento does not have a built-in UI for blocking specific user agents (other than through robots.txt). For server-level blocking, use .htaccess (Apache installs ship with a .htaccess in the pub/ directory) or nginx location blocks. For Adobe Commerce Cloud with Fastly, create a custom WAF rule to block by user agent — this is more reliable than robots.txt because it rejects requests before they reach the PHP application layer.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.