Skip to content
openshadow.io/guides/blocking-ai-bots-woocommerce

How to Block AI Bots on WooCommerce: Complete 2026 Guide

WooCommerce stores are high-value targets for AI crawlers. Your product descriptions, pricing, reviews, and structured catalog are exactly what training datasets are built from — and the WooCommerce REST API at /wp-json/wc/ serves it all as machine-readable JSON. This guide covers every blocking method from basic robots.txt to server-level .htaccess rules.

8 min read·Updated April 2026·WooCommerce 8.x / WordPress 6.x

WooCommerce REST API risk: Unlike a static site, WooCommerce exposes /wp-json/wc/v3/products, /wp-json/wc/v3/categories, and similar endpoints publicly (no authentication required by default). These return your full product catalog as structured JSON — a perfect AI training dataset. Block /wp-json/wc/ in robots.txt and in .htaccess.

Methods overview

Method
robots.txt — protect shop & product pages

Always — first thing to configure

Block WooCommerce REST API (/wp-json/wc/)

If you have a public product catalog

noai meta tag via functions.php

For product pages and shop pages

.htaccess server-level blocking

Apache hosting — most effective

Cloudflare WAF rule

Managed/shared hosting, or Kinsta/WP Engine

1. robots.txt — protect shop & product pages

WooCommerce generates several URL patterns worth protecting: /shop/, /product/, /product-category/, /cart/, /checkout/, and the REST API. Cart and checkout should always be disallowed — even for regular search engines.

Option A — Yoast SEO file editor (no SSH required)

In your WordPress dashboard: SEO → Tools → File Editor → Edit robots.txt. Yoast SEO (free) and Yoast WooCommerce SEO both expose this editor.

robots.txtWordPress root
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Diffbot
Disallow: /

# Block all other AI bots from WooCommerce-specific paths
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /wp-json/wc/

Option B — Physical file via FTP/SFTP

Connect to your server root (same directory as wp-config.php). Edit or create robots.txt directly. If Yoast has previously written a robots.txt, your edits will be overwritten next time Yoast regenerates — use option A or disable Yoast's robots.txt management (SEO → Search Appearance → Advanced → disable).

Yoast conflict warning: If Yoast SEO is active, it generates robots.txt dynamically on certain requests and may overwrite your physical file. Either use Yoast's editor (Option A) or go to SEO → Search Appearance → Advanced and disable "Yoast SEO manages robots.txt" before editing the physical file.

2. Block WooCommerce REST API (/wp-json/wc/)

The WooCommerce REST API serves your product catalog, categories, tags, orders (public endpoints only), and attributes as structured JSON. By default, endpoints like /wp-json/wc/v3/products are publicly accessible without authentication. AI training crawlers will index these JSON responses.

robots.txt disallow for REST API

robots.txt — add to each AI bot block
User-agent: GPTBot
Disallow: /
Disallow: /wp-json/wc/

# OR if you want to only protect the API (not block all pages):
User-agent: GPTBot
Disallow: /wp-json/wc/
Disallow: /wp-json/wc/v3/

.htaccess hard block for REST API

For a hard server-level block that stops AI bots from accessing the API regardless of robots.txt compliance (Bytespider and Diffbot often ignore robots.txt):

.htaccess — add above the WordPress block
# Block AI bots from WooCommerce REST API
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/wp-json/wc/ [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Diffbot|cohere-ai|AI2Bot|DeepSeekBot|MistralBot|Amazonbot|Applebot-Extended|xAI-Bot|OAI-SearchBot|ChatGPT-User) [NC]
RewriteRule ^ - [F,L]
</IfModule>

# BEGIN WordPress
# @see https://wordpress.org/documentation/article/htaccess/
<IfModule mod_rewrite.c>
RewriteEngine On
...
</IfModule>
# END WordPress

Require WordPress auth for all REST endpoints: If your theme or plugins don't need public REST API access, you can also require authentication for all /wp-json/ requests. Add to functions.php: add_filter('rest_authentication_errors', fn($r) => is_null($r) ? new WP_Error('rest_forbidden', 'Forbidden', ['status' => 401]) : $r); — but test thoroughly as this can break Gutenberg and page builders.

3. noai meta tag via functions.php

Add <meta name="robots" content="noai, noimageai"> to product pages and the shop page. This tells compliant AI crawlers not to train on the page content or product images.

Global noai — all WooCommerce pages

Add to your child theme's functions.php. Uses WooCommerce's is_woocommerce() conditional to target only shop, product, and archive pages.

functions.php (child theme)
<?php
/**
 * Add noai meta tag to WooCommerce pages
 * Targets: shop, product, product category/tag pages
 */
function openshadow_noai_on_woocommerce() {
    if ( function_exists( 'is_woocommerce' ) && is_woocommerce() ) {
        echo '<meta name="robots" content="noai, noimageai">' . PHP_EOL;
    }
}
add_action( 'wp_head', 'openshadow_noai_on_woocommerce' );

Site-wide noai for all pages

functions.php — global variant
<?php
function openshadow_noai_global() {
    echo '<meta name="robots" content="noai, noimageai">' . PHP_EOL;
}
add_action( 'wp_head', 'openshadow_noai_global' );

Per-product override with ACF or custom meta

functions.php — per-product control
<?php
/**
 * Add noai meta tag to products unless explicitly opted in.
 * Custom product meta field: _allow_ai_training (value: '1' = allow)
 */
function openshadow_product_noai() {
    if ( ! is_singular( 'product' ) ) {
        return;
    }
    $product_id = get_the_ID();
    $allow_ai   = get_post_meta( $product_id, '_allow_ai_training', true );

    if ( '1' !== $allow_ai ) {
        echo '<meta name="robots" content="noai, noimageai">' . PHP_EOL;
    }
}
add_action( 'wp_head', 'openshadow_product_noai' );

4. .htaccess server-level blocking

Apache-based hosting (most shared hosting) lets you block AI bots at the server level before WordPress even loads. This stops bots that ignore robots.txt, like Bytespider and Diffbot. Add rules above the # BEGIN WordPress block.

Block all AI bots site-wide

.htaccess — add above # BEGIN WordPress
# Block AI training and scraping bots
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|webzio-extended|gemini-deep-research) [NC]
RewriteRule ^ - [F,L]
</IfModule>

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

Block only WooCommerce paths (selective)

.htaccess — path-selective variant
<IfModule mod_rewrite.c>
RewriteEngine On

# Block AI bots from shop and product pages only
RewriteCond %{REQUEST_URI} ^/(shop|product|product-category|product-tag|wp-json/wc) [NC]
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|anthropic-ai|Google-Extended|Bytespider|CCBot|Diffbot|meta-externalagent|cohere-ai) [NC]
RewriteRule ^ - [F,L]
</IfModule>

Hosting compatibility: This works on Apache (most shared hosts: Bluehost, SiteGround, Hostinger, GoDaddy, DreamHost). For nginx (Kinsta, Flywheel, WP Engine), use the Cloudflare WAF approach or ask your host to add nginx if ($http_user_agent) rules. nginx does not read .htaccess.

5. Cloudflare WAF rule

Cloudflare's WAF blocks bots before traffic reaches your WooCommerce server. This is the recommended approach for managed WordPress hosts (Kinsta, WP Engine, Flywheel, Pressable) that don't expose .htaccess or nginx config files.

Cloudflare dashboard WAF rule

Security → WAF → Custom Rules → Create rule. Set action to Block.

Cloudflare WAF — Expression editor
(
  http.user_agent contains "GPTBot" or
  http.user_agent contains "ChatGPT-User" or
  http.user_agent contains "OAI-SearchBot" or
  http.user_agent contains "ClaudeBot" or
  http.user_agent contains "anthropic-ai" or
  http.user_agent contains "Google-Extended" or
  http.user_agent contains "Bytespider" or
  http.user_agent contains "CCBot" or
  http.user_agent contains "PerplexityBot" or
  http.user_agent contains "meta-externalagent" or
  http.user_agent contains "Diffbot" or
  http.user_agent contains "cohere-ai" or
  http.user_agent contains "AI2Bot" or
  http.user_agent contains "DeepSeekBot" or
  http.user_agent contains "MistralBot" or
  http.user_agent contains "Amazonbot" or
  http.user_agent contains "Applebot-Extended" or
  http.user_agent contains "xAI-Bot" or
  http.user_agent contains "omgili" or
  http.user_agent contains "omgilibot" or
  http.user_agent contains "webzio-extended" or
  http.user_agent contains "gemini-deep-research"
)

WooCommerce + Cloudflare: protect product pages only

Use this expression to block AI bots from product/shop pages and the REST API while allowing them to crawl your blog and informational content:

Cloudflare WAF — path + user-agent expression
(
  (
    http.request.uri.path contains "/shop" or
    http.request.uri.path contains "/product" or
    http.request.uri.path contains "/cart" or
    http.request.uri.path contains "/checkout" or
    http.request.uri.path contains "/wp-json/wc/"
  )
  and
  (
    http.user_agent contains "GPTBot" or
    http.user_agent contains "ClaudeBot" or
    http.user_agent contains "CCBot" or
    http.user_agent contains "Bytespider" or
    http.user_agent contains "Diffbot" or
    http.user_agent contains "Google-Extended"
  )
)

Free plan note: Cloudflare's free plan supports 5 custom WAF rules. If you need more, use a single rule with all user agents combined (as shown above). Paid plans (Pro $20/mo) support more complex rule sets and Bot Fight Mode, which automatically identifies and blocks known bad bots including many AI crawlers.

AI bots targeting WooCommerce stores

25 bots that actively crawl e-commerce sites. Diffbot is particularly aggressive on WooCommerce — it's a commercial data broker that resells scraped product catalogs.

BotOperator
GPTBotOpenAI
ChatGPT-UserOpenAI
OAI-SearchBotOpenAI
ClaudeBotAnthropic
anthropic-aiAnthropic
Google-ExtendedGoogle
BytespiderByteDance
CCBotCommon Crawl
PerplexityBotPerplexity
meta-externalagentMeta
AmazonbotAmazon
Applebot-ExtendedApple
xAI-BotxAI
DeepSeekBotDeepSeek
MistralBotMistral
DiffbotDiffbot
cohere-aiCohere
AI2BotAllen Institute
Ai2Bot-DolmaAllen Institute
YouBotYou.com
DuckAssistBotDuckDuckGo
omgiliWebz.io
omgilibotWebz.io
webzio-extendedWebz.io
gemini-deep-researchGoogle

Method comparison

MethodStops Bytespider?Stops Diffbot?
robots.txtNo (ignores)No (ignores)
noai meta tagNoNo
.htaccessYesYes
Cloudflare WAFYesYes

FAQ

Does blocking AI bots in robots.txt affect WooCommerce SEO?

No. Blocking AI training bots (GPTBot, CCBot, ClaudeBot) does not affect Google or Bing search rankings. These are separate crawlers. Your products will still be indexed by Googlebot and Bingbot — those are not in the block list unless you add them explicitly.

Should I block AI bots from my WooCommerce product pages?

It depends. Block training bots (GPTBot, CCBot) if you want to prevent AI companies from using your product descriptions. Allow AI search bots (OAI-SearchBot, PerplexityBot) if you want your products to appear in AI shopping recommendations. You can configure both selectively in robots.txt — different rules per user-agent.

Is the WooCommerce REST API crawled by AI bots?

Yes. /wp-json/wc/v3/products returns your full product catalog as structured JSON — no authentication needed by default. This is high-value training data. Block /wp-json/wc/ in robots.txt and optionally in .htaccess or Cloudflare WAF.

Does the Yoast WooCommerce SEO plugin help block AI bots?

Yoast SEO exposes a robots.txt editor at SEO → Tools → File Editor. This is the easiest way to add AI bot disallow rules without SSH access. However, Yoast may periodically overwrite the physical robots.txt — if you edit the physical file instead, disable Yoast's robots.txt management in SEO → Search Appearance → Advanced first.

What WooCommerce paths should I prioritize blocking?

Priority order: (1) /wp-json/wc/ — structured product data, highest AI training value; (2) /product/ and /shop/ — product descriptions and images; (3) /cart/ and /checkout/ — always disallow (no training value, exposes session patterns).

My WooCommerce store is on managed WordPress hosting — what do I do?

Kinsta, WP Engine, and Flywheel use nginx and don't expose .htaccess. Your options are: (1) Yoast SEO robots.txt editor for signals-only blocking; (2) Cloudflare WAF if your DNS proxies through Cloudflare (free plan works); (3) Ask your host's support to add nginx user-agent rules — most managed hosts will do this on request.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides