Google-Extended is Google's dedicated AI training crawler for Gemini and Bard. Here's how to opt out — without touching your Search rankings.
Google-Extended is a standalone user agent token that Google introduced in September 2023 specifically for crawling content used to train its AI products — Gemini (formerly Bard) and Vertex AI. It is distinct from Googlebot, which crawls exclusively for Search indexing.
Before Google-Extended existed, Google had no clean separation between Search crawling and AI training. The introduction of this separate token was a direct response to publisher pressure — giving websites a way to opt out of AI training without sacrificing Search visibility.
Google-Extended is used to power the knowledge base that makes Gemini's responses more accurate, up-to-date, and factually grounded. If you are a news publisher, creative content creator, or any site where your unique writing represents commercial value, you may have legitimate reasons to opt out.
| Google-Extended | Googlebot | |
|---|---|---|
| Purpose | AI model training (Gemini, Vertex AI) | Search indexing and ranking |
| Affects SEO? | No | Yes — directly |
| User agent token | Google-Extended | Googlebot |
| Respects robots.txt? | Yes ✓ | Yes ✓ |
| Safe to block? | Yes — no SEO consequence | Only if you want to disappear from Google |
| Introduced | September 2023 | 1996 |
robots.txt (Recommended)The robots.txt block is the standard, Google-endorsed method for opting out of Gemini AI training. Add these two lines to your robots.txt file:
User-agent: Google-Extended Disallow: /
Place at the top of your robots.txt or after your existing Googlebot rules.
# Block Google-Extended from premium/paywalled content User-agent: Google-Extended Disallow: /articles/ Disallow: /premium/ Disallow: /blog/ # Allow Googlebot to index everything normally User-agent: Googlebot Allow: /
# Block AI training crawlers — preserves Search indexing User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Bytespider Disallow: / # Allow all standard search bots User-agent: Googlebot Allow: / User-agent: Bingbot Allow: /
noai Meta TagFor granular control — blocking specific pages from AI training without modifying robots.txt — add the noai and noimageai meta tags. Google has stated it honors these signals.
<!-- Block AI training on this page (text + images) --> <meta name="robots" content="noai, noimageai"> <!-- Or target Google-Extended specifically --> <meta name="google-extended" content="noindex">
⚠️ Important caveat
The noai meta tag is a proposed standard with mixed adoption. Google has signaled intent to honor it, but the robots.txt block above is more reliable and universally accepted. Use both for belt-and-suspenders coverage.
For Next.js apps, generate robots.txt programmatically via the App Router:
import { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
// Block Google-Extended from AI training
userAgent: 'Google-Extended',
disallow: ['/'],
},
{
// Allow Googlebot to index normally
userAgent: 'Googlebot',
allow: ['/'],
},
{
// Block other AI training crawlers
userAgent: ['GPTBot', 'ClaudeBot', 'PerplexityBot', 'Bytespider'],
disallow: ['/'],
},
{
// Allow all other well-behaved bots
userAgent: '*',
allow: ['/'],
},
],
sitemap: 'https://yoursite.com/sitemap.xml',
};
}After updating robots.txt, verify the block is correctly configured using these methods:
Visit your robots.txt directly and confirm the Google-Extended rules appear:
https://yoursite.com/robots.txt
Use Google Search Console's robots.txt Tester to simulate Google-Extended fetching your pages. Enter Google-Extended as the user agent.
Search Console → Settings → robots.txt Tester
Scan your access logs for Google-Extended visits to confirm it's hitting robots.txt and backing off:
# Apache / nginx access log grep "Google-Extended" /var/log/nginx/access.log | tail -20 # Look for: 200 on /robots.txt followed by no further requests # If you see 200s on content pages, your block isn't working
Run your site through Open Shadow's robot checker to confirm Google-Extended is blocked alongside other AI bots:
→ Check your robots.txt nowRun a free AI visibility check to see which bots have access to your content — and what they can see.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.
Scan My Site Free →