How to Block AI Bots on TanStack Start
TanStack Start is a full-stack React framework built on TanStack Router — the type-safe, file-based router that took the React ecosystem by storm. TanStack Start adds server-side rendering, server functions, and a full HTTP server via Vinxi and Nitro. This layered architecture gives you two distinct places for AI bot protection: the TanStack Router layer (file-based routes, head() meta tags) and the Nitro server layer (routeRules headers, server middleware for UA blocking). This guide covers all four protection layers with TanStack Start's specific APIs.
app/routes/) and a Nitro server layer (HTTP server, server/ directory). Nitro runs before the router — server middleware intercepts all requests first. Use Nitro for headers and hard blocking; use the router layer for meta tags.1. robots.txt
TanStack Start is built on Vite. Files in public/ are served at the root URL automatically — the same convention as Vite, Next.js, and SvelteKit.
Static robots.txt
Create public/robots.txt:
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Diffbot
Disallow: /
# Allow standard search crawlers
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Allow: /Dynamic robots.txt via Nitro server route
For environment-aware content, use a Nitro server route. TanStack Start uses Nitro as its HTTP server layer — files in server/routes/ are served before TanStack Router handles React routes.
Create server/routes/robots.txt.ts:
// server/routes/robots.txt.ts
import { defineEventHandler } from 'h3'
export default defineEventHandler((event) => {
const isDev = process.env.NODE_ENV !== 'production'
event.node.res.setHeader('Content-Type', 'text/plain; charset=utf-8')
if (isDev) {
return 'User-agent: *
Disallow: /
'
}
return `User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: meta-externalagent
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Allow: /
`
})server/routes/) take priority over static files in public/when paths conflict. If you have both public/robots.txt and server/routes/robots.txt.ts, the server route wins. Use one or the other.2. noai meta via head() in routes
TanStack Start uses TanStack Router's head() option for document head management. Add it to your root route for site-wide coverage.
Root route — app/routes/__root.tsx
// app/routes/__root.tsx
import { createRootRoute, Outlet } from '@tanstack/react-router'
import { Meta, Scripts } from '@tanstack/start'
export const Route = createRootRoute({
head: () => ({
meta: [
{ charSet: 'utf-8' },
{ name: 'viewport', content: 'width=device-width, initial-scale=1' },
// AI bot protection — applies to every page
{ name: 'robots', content: 'noai, noimageai' },
],
links: [
{ rel: 'stylesheet', href: '/styles.css' },
],
}),
component: RootComponent,
})
function RootComponent() {
return (
<html lang="en">
<head>
<Meta />
</head>
<body>
<Outlet />
<Scripts />
</body>
</html>
)
}The head() function returns an object with meta, links, and scripts arrays. The <Meta /> component renders all collected meta tags from root to the current route.
Per-route meta override
Individual routes can override the root robots meta by adding their own head() option:
// app/routes/blog.$slug.tsx
import { createFileRoute } from '@tanstack/react-router'
import { useHead } from '@tanstack/start'
export const Route = createFileRoute('/blog/$slug')({
loader: async ({ params }) => {
return await fetchPost(params.slug)
},
head: ({ loaderData }) => ({
meta: [
{ title: loaderData.title },
// Allow AI indexing for public blog posts
{ name: 'robots', content: 'index, follow' },
],
}),
component: BlogPost,
})createFileRoute take precedence over the root route defaults.Data-driven meta from loader
For pages where robots values come from a CMS or database, use the head() function with loader data:
export const Route = createFileRoute('/pages/$slug')({
loader: async ({ params }) => {
const page = await fetchPage(params.slug)
return page
},
head: ({ loaderData }) => ({
meta: [
{ name: 'robots', content: loaderData?.robots ?? 'noai, noimageai' },
{ title: loaderData?.title ?? 'Page' },
],
}),
component: DynamicPage,
})3. X-Robots-Tag via routeRules
TanStack Start's underlying Nitro server supports routeRules — a declarative way to set headers, redirects, and cache rules for URL patterns. Configure them in app.config.ts:
// app.config.ts
import { defineConfig } from '@tanstack/start/config'
export default defineConfig({
server: {
// Nitro server configuration
routeRules: {
// Apply X-Robots-Tag to all routes
'/**': {
headers: {
'X-Robots-Tag': 'noai, noimageai',
},
},
},
},
})Nitro applies these rules at the server level — before React rendering. All HTML pages, API responses, and server function responses receive the header.
Selective headers by path
To apply different headers to different paths (e.g., no robots header on API routes):
// app.config.ts
export default defineConfig({
server: {
routeRules: {
// HTML pages — block AI training
'/**': {
headers: { 'X-Robots-Tag': 'noai, noimageai' },
},
// API routes — no robots header needed
'/api/**': {
headers: { 'X-Robots-Tag': undefined },
},
},
},
})4. Hard 403 via Nitro server middleware
Nitro server middleware runs on every request before routing. Use it to reject AI crawlers before any React rendering occurs.
Create server/middleware/bot-block.ts:
// server/middleware/bot-block.ts
import { defineEventHandler, getHeader, createError } from 'h3'
const AI_BOT_PATTERNS = [
'GPTBot',
'ClaudeBot',
'Claude-Web',
'anthropic-ai',
'CCBot',
'Google-Extended',
'PerplexityBot',
'Applebot-Extended',
'Amazonbot',
'meta-externalagent',
'Bytespider',
'Diffbot',
'YouBot',
'cohere-ai',
]
const EXEMPT_PATHS = [
'/robots.txt',
'/sitemap.xml',
'/favicon.ico',
]
function isAIBot(ua: string): boolean {
const lower = ua.toLowerCase()
return AI_BOT_PATTERNS.some((p) => lower.includes(p.toLowerCase()))
}
export default defineEventHandler((event) => {
const path = event.path ?? ''
// Always allow crawlers to read robots.txt
if (EXEMPT_PATHS.some((p) => path === p)) {
return
}
const ua = getHeader(event, 'user-agent') ?? ''
if (isAIBot(ua)) {
throw createError({
statusCode: 403,
statusMessage: 'Forbidden',
})
}
})server/middleware/ in alphabetical order. No manual registration is needed — just create the file. Middleware runs on every request before route handlers, in filename alphabetical order when multiple files exist.Combined middleware — block + headers
Combine hard blocking and X-Robots-Tag in one middleware file instead of relying on both routeRules and a separate middleware:
// server/middleware/ai-protection.ts
import {
defineEventHandler,
getHeader,
setResponseHeader,
createError,
} from 'h3'
const AI_BOT_PATTERNS = [
'GPTBot', 'ClaudeBot', 'Claude-Web', 'anthropic-ai',
'CCBot', 'Google-Extended', 'PerplexityBot', 'Applebot-Extended',
'Amazonbot', 'meta-externalagent', 'Bytespider', 'Diffbot',
'YouBot', 'cohere-ai',
]
export default defineEventHandler(async (event) => {
const path = event.path ?? ''
if (path === '/robots.txt' || path === '/sitemap.xml') {
return // proceed normally
}
const ua = getHeader(event, 'user-agent') ?? ''
if (AI_BOT_PATTERNS.some((p) => ua.toLowerCase().includes(p.toLowerCase()))) {
throw createError({ statusCode: 403, statusMessage: 'Forbidden' })
}
// Legitimate request — add X-Robots-Tag to response
// Note: setResponseHeader must be called before the response body is written
event.node.res.setHeader('X-Robots-Tag', 'noai, noimageai')
})5. createMiddleware vs Nitro middleware
TanStack Start exports a createMiddleware function — but it serves a different purpose than Nitro middleware:
| API | Scope | Use case |
|---|---|---|
createMiddleware (@tanstack/start) | Server function context | Auth checks, logging, data injection for createServerFn |
defineEventHandler (h3/Nitro) | Every HTTP request | UA checking, header injection, hard 403 blocking |
For AI bot blocking, always use Nitro middleware (server/middleware/ with defineEventHandler). The createMiddleware API runs only when a server function is invoked — not on page requests or static asset fetches.
// ❌ This does NOT block AI bots on page requests
import { createMiddleware } from '@tanstack/start'
export const authMiddleware = createMiddleware().server(async ({ next }) => {
// Only runs when a createServerFn() is called
return await next()
})
// ✅ This blocks AI bots on ALL requests
// server/middleware/bot-block.ts — runs on every HTTP request
import { defineEventHandler } from 'h3'
export default defineEventHandler((event) => {
// Runs before routing, on every request
})6. Deployment adapters
TanStack Start uses Nitro's deployment adapters — the same code deploys to multiple targets. AI bot protection works identically across all adapters because the server middleware and routeRules are at the Nitro level.
Configuring the adapter
// app.config.ts
import { defineConfig } from '@tanstack/start/config'
export default defineConfig({
server: {
preset: 'vercel', // or 'netlify', 'cloudflare-pages', 'node-server', 'bun'
routeRules: {
'/**': {
headers: { 'X-Robots-Tag': 'noai, noimageai' },
},
},
},
})| Adapter | robots.txt | Meta tags | X-Robots-Tag | Hard 403 |
|---|---|---|---|---|
| Node server | ✓ | ✓ | ✓ | ✓ |
| Vercel | ✓ | ✓ | ✓ | ✓ |
| Netlify | ✓ | ✓ | ✓ | ✓ |
| Cloudflare Pages | ✓ | ✓ | ✓ | ✓ |
| Bun | ✓ | ✓ | ✓ | ✓ |
Because AI bot protection is implemented at the Nitro server layer (not the adapter layer), it works identically everywhere. The server/middleware/ and routeRules configuration is adapter-agnostic.
FAQ
How do I serve robots.txt in TanStack Start?
Place robots.txt in public/ — Vite serves static files from this directory at the root URL automatically. For dynamic content, create server/routes/robots.txt.ts and export a defineEventHandler() that returns text/plain content. Nitro server routes take priority over public/ static files when paths conflict.
How do I add noai meta tags in TanStack Start?
Add a head() option to createRootRoute in app/routes/__root.tsx: head: () => ({ meta: [{ name: "robots", content: "noai, noimageai" }] }). Render <Meta /> in your root component's <head>. For per-route overrides, add head() to individual createFileRoute() calls — leaf route values win for duplicate meta names.
How do I set X-Robots-Tag headers globally in TanStack Start?
Add routeRules to the server config in app.config.ts: server: { routeRules: { "/**": { headers: { "X-Robots-Tag": "noai, noimageai" } } } }. Nitro applies these at the server level before React rendering. Alternatively, set the header directly in Nitro server middleware using event.node.res.setHeader().
How do I hard-block AI bots in TanStack Start?
Create server/middleware/bot-block.ts and export a default defineEventHandler(). Check getHeader(event, 'user-agent') against AI bot patterns. Use throw createError({ statusCode: 403 }) to reject — this stops middleware chain execution and prevents React rendering. Exempt /robots.txt by checking event.path before the UA check.
What is the difference between TanStack Start middleware and Nitro middleware?
createMiddleware from @tanstack/start runs only in the context of createServerFn() calls — not on page requests. Nitro middleware (server/middleware/ with defineEventHandler from h3) runs on every HTTP request before routing. For AI bot blocking, always use Nitro middleware.
Does blocking AI bots affect search engine crawling?
No. The UA patterns in this guide target AI training crawlers specifically (GPTBot, ClaudeBot, CCBot, Google-Extended, PerplexityBot). Googlebot and Bingbot use different user agent strings and are not matched. Always exempt /robots.txt from blocking so crawlers can read your rules, and include explicit Allow: / rules for Googlebot and Bingbot.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.