Skip to content
Svelte·9 min read

How to Block AI Bots on Svelte: Complete 2026 Guide

Svelte compiled with Vite produces a pure SPA: a static index.html shell and JavaScript bundles. There is no Node server, no request handler, no middleware. This guide covers every blocking method that works within that constraint — from public/robots.txt to Cloudflare Pages _worker.js to nginx hard blocking.

Svelte SPA vs SvelteKit

This guide covers Svelte + Vite (plain SPA — no server, no SSR). If you are using SvelteKit, see the SvelteKit guide — it has server-side rendering, hooks.server.ts middleware, and much more powerful blocking options.

Methods at a glance

MethodWhat it doesBlocks JS-less bots?
public/robots.txtSignals crawlers to stay outSignal only
index.html noai tagOpt out of AI training (all crawlers)✓ (pre-JS HTML)
<svelte:head> noaiPer-component noai tag✗ (JS-only)
X-Robots-Tag headernoai via HTTP header (all pages)✓ (header)
nginx map + return 403Hard block at reverse proxy
Cloudflare Pages _worker.jsHard block at edge (CF Pages only)
Vercel Edge MiddlewareHard block at edge (Vercel only)
Netlify Edge FunctionsHard block at edge (Netlify only)

1. robots.txt — public/robots.txt

Vite copies everything in public/ verbatim to the build output at the root level. A public/robots.txt file becomes dist/robots.txt and is served at /robots.txt with no extra config.

# public/robots.txt
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: DuckAssistBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: omgili
Disallow: /

User-agent: Webzio
Disallow: /

User-agent: AI2Bot
Disallow: /

User-agent: DeepSeekBot
Disallow: /

User-agent: MistralAI
Disallow: /

User-agent: xAI-Bot
Disallow: /

User-agent: gemini-deep-research
Disallow: /

User-agent: *
Allow: /

Vite public/ vs src/assets/

Only files in public/ are copied verbatim. Files in src/assets/ go through the Vite build pipeline and get content-hashed filenames — robots.txt must be in public/.

2. noai meta tag in index.html

In a Vite project, index.html lives at the project root and is the entry point Vite processes. Unlike the <svelte:head> block in your components (which requires JavaScript to execute), tags you add directly to index.html are in the raw HTML response — crawlers see them immediately, before any JavaScript runs.

This is the most reliable way to deliver noai to non-JS crawlers from a Svelte SPA.

<!-- index.html (project root — Vite entry point) -->
<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />

    <!-- AI training opt-out — visible before JavaScript runs -->
    <meta name="robots" content="noai, noimageai" />

    <title>My App</title>
  </head>
  <body>
    <div id="app"></div>
    <script type="module" src="/src/main.ts"></script>
  </body>
</html>

Why index.html beats <svelte:head> for global noai

Svelte SPA is client-rendered: the server sends a nearly-empty HTML shell and Svelte injects content after JavaScript loads. Any <svelte:head> tags — including a noai meta — are injected by JS at runtime. A bot that doesn't execute JavaScript sees <div id="app"></div> and nothing else. Adding noai directly to index.html ensures every HTML response carries it, regardless of JavaScript.

Per-page noai is not possible in a plain SPA

A Svelte SPA has a single index.html served for every route via the HTML5 History API fallback. You cannot serve a different <head> per route without a server layer. If per-route control matters, use SvelteKit.

3. <svelte:head> meta tags (JS-only)

<svelte:head> lets any Svelte component inject into <head>. In SPA mode this runs after JavaScript loads — useful for dynamic title/description updates, but invisible to non-JS crawlers. Use it for secondary coverage, not as your primary noai mechanism.

<!-- App.svelte — global noai via svelte:head (JS-only fallback) -->
<script lang="ts">
  // Component logic
</script>

<svelte:head>
  <meta name="robots" content="noai, noimageai" />
</svelte:head>

<main>
  <!-- your app content -->
</main>
<!-- Per-route component — override if a specific page allows AI -->
<svelte:head>
  <meta name="robots" content="index, follow" />
</svelte:head>

Caveat: SPA vs SSR behaviour

In SvelteKit with SSR enabled, <svelte:head> tags are rendered server-side and appear in the HTML source — bots see them. In a Svelte SPA (Vite, no SSR), they are injected at runtime by JavaScript — bots that don't run JS never see them. This is the fundamental difference between the two setups.

4. X-Robots-Tag response header

The X-Robots-Tag HTTP header works like the <meta name="robots"> tag but is set at the server or CDN layer — no HTML changes required. It applies to every response, including non-HTML files.

Netlify (netlify.toml)

# netlify.toml
[[headers]]
  for = "/*"
  [headers.values]
    X-Robots-Tag = "noai, noimageai"

Vercel (vercel.json)

{
  "headers": [
    {
      "source": "/(.*)",
      "headers": [
        { "key": "X-Robots-Tag", "value": "noai, noimageai" }
      ]
    }
  ]
}

Cloudflare Pages (_headers)

# public/_headers (copied to dist/ by Vite)
/*
  X-Robots-Tag: noai, noimageai

nginx

server {
    listen 80;
    server_name example.com;
    root /var/www/html/dist;

    add_header X-Robots-Tag "noai, noimageai" always;

    location / {
        try_files $uri $uri/ /index.html;
    }
}

5. nginx hard blocking (return 403)

If you self-host your Svelte build behind nginx, a map block lets you match User-Agent strings and return 403 before any static file is served. Compile the bot list once in the map block — the regex is evaluated per request at near-zero cost.

# nginx.conf (http block)
map $http_user_agent $block_ai_bot {
    default                 0;
    ~*GPTBot                1;
    ~*ChatGPT-User          1;
    ~*OAI-SearchBot         1;
    ~*ClaudeBot             1;
    ~*Claude-Web            1;
    ~*anthropic-ai          1;
    ~*Google-Extended       1;
    ~*Bytespider            1;
    ~*CCBot                 1;
    ~*meta-externalagent    1;
    ~*Amazonbot             1;
    ~*Applebot-Extended     1;
    ~*PerplexityBot         1;
    ~*cohere-ai             1;
    ~*YouBot                1;
    ~*DuckAssistBot         1;
    ~*Diffbot               1;
    ~*omgilibot             1;
    ~*omgili                1;
    ~*Webzio                1;
    ~*AI2Bot                1;
    ~*DeepSeekBot           1;
    ~*MistralAI             1;
    ~*xAI-Bot               1;
    ~*gemini-deep-research  1;
}

server {
    listen 443 ssl;
    server_name example.com;
    root /var/www/html/dist;

    # Always serve robots.txt — blocking it hides your rules from crawlers
    location = /robots.txt {
        try_files $uri =404;
    }

    # Block matched bots
    location / {
        if ($block_ai_bot) {
            return 403 "Forbidden";
        }
        try_files $uri $uri/ /index.html;
    }
}

nginx if() inside location

nginx's if() directive is generally discouraged for complex rewrites but is safe for simple return statements. The pattern above is production-safe. Alternatively, use deny via a geo/map + satisfy combination if you need more control.

6. Cloudflare Pages — _worker.js

Cloudflare Pages supports a special _worker.js file that runs as a Cloudflare Worker before static assets are served. This gives you server-like middleware for a fully static Svelte SPA — no Node server required.

Place _worker.js in public/ and Vite will copy it to dist/. Cloudflare Pages picks it up automatically when it finds _worker.js at the root of the output directory.

// public/_worker.js
// Runs at the Cloudflare edge before static assets are served

const BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|meta-externalagent|Amazonbot|Applebot-Extended|PerplexityBot|cohere-ai|YouBot|DuckAssistBot|Diffbot|omgilibot|omgili|Webzio|AI2Bot|DeepSeekBot|MistralAI|xAI-Bot|gemini-deep-research/i;

export default {
  async fetch(request, env) {
    const url = new URL(request.url);

    // Always serve robots.txt — never block it
    if (url.pathname === '/robots.txt') {
      return env.ASSETS.fetch(request);
    }

    const ua = request.headers.get('User-Agent') || '';

    if (BLOCKED_UAS.test(ua)) {
      return new Response('Forbidden', {
        status: 403,
        headers: { 'Content-Type': 'text/plain' },
      });
    }

    // Pass through to Cloudflare Pages static asset serving
    return env.ASSETS.fetch(request);
  },
};

env.ASSETS — Cloudflare Pages asset binding

env.ASSETS is the Cloudflare Pages asset binding that serves your static files. It handles the SPA fallback (serving index.html for unknown paths) automatically. You must call env.ASSETS.fetch(request) — not a plain fetch(request) — to reach static files.

_worker.js vs Functions/ directory

Cloudflare Pages also supports a functions/ directory for route-specific handlers. Use _worker.js when you want a single handler for ALL requests (including the asset fallback). Use functions/ when you need per-route logic alongside static asset serving.

7. Vercel — Edge Middleware

For Svelte SPAs deployed to Vercel (not SvelteKit), create a middleware.js (or middleware.ts) at the project root. Vercel runs it at the edge before serving static files.

// middleware.js (project root — NOT in src/ or public/)
// Vercel deploys this as an Edge Middleware automatically

import { NextResponse } from 'next/server';

const BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|meta-externalagent|Amazonbot|Applebot-Extended|PerplexityBot|cohere-ai|YouBot|DuckAssistBot|Diffbot|omgilibot|omgili|Webzio|AI2Bot|DeepSeekBot|MistralAI|xAI-Bot|gemini-deep-research/i;

export function middleware(request) {
  const { pathname } = request.nextUrl;

  // Always serve robots.txt
  if (pathname === '/robots.txt') {
    return NextResponse.next();
  }

  const ua = request.headers.get('user-agent') || '';

  if (BLOCKED_UAS.test(ua)) {
    return new Response('Forbidden', { status: 403 });
  }

  return NextResponse.next();
}

export const config = {
  // Run on all routes — excludes _next/static and _next/image automatically
  matcher: ['/((?!_next/static|_next/image|favicon.ico).*)'],
};

Vite + Vercel note

Vercel's Edge Middleware uses the Next.js runtime API (next/server) even for non-Next.js projects. Add next as a dev dependency (npm i -D next) so TypeScript types resolve — it won't be bundled into your Svelte app.

8. Netlify — Edge Functions

Netlify's _headers file only supports header injection — it cannot return a 403 based on User-Agent. For hard blocking on Netlify, use an Edge Function.

// netlify/edge-functions/block-ai-bots.js
const BLOCKED_UAS = /GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|Claude-Web|anthropic-ai|Google-Extended|Bytespider|CCBot|meta-externalagent|Amazonbot|Applebot-Extended|PerplexityBot|cohere-ai|YouBot|DuckAssistBot|Diffbot|omgilibot|omgili|Webzio|AI2Bot|DeepSeekBot|MistralAI|xAI-Bot|gemini-deep-research/i;

export default async (request, context) => {
  const url = new URL(request.url);

  if (url.pathname === '/robots.txt') {
    return context.next();
  }

  const ua = request.headers.get('user-agent') || '';

  if (BLOCKED_UAS.test(ua)) {
    return new Response('Forbidden', { status: 403 });
  }

  return context.next();
};

export const config = { path: '/*' };
# netlify.toml — wire the edge function
[[edge_functions]]
  function = "block-ai-bots"
  path = "/*"

Svelte SPA vs SvelteKit — blocking comparison

FeatureSvelte SPA (Vite)SvelteKit
robots.txtpublic/robots.txtstatic/robots.txt
Global noai tagindex.html (root) — pre-JSapp.html (src/) — pre-JS
Per-route noaiNot possible (single index.html)<svelte:head> — SSR-rendered
svelte:head bots see?✗ (JS-only)✓ (SSR-rendered in HTML)
Hard 403 server-sideNo (no server layer)hooks.server.ts handle()
Hard 403 edge/CDNCloudflare _worker.js / Vercel Edge / Netlify Edge FnAdapter + hooks OR above
X-Robots-Tagnetlify.toml / vercel.json / nginx / _headershooks.server.ts or same hosting config
Middleware conceptNone — pure static outputhooks.server.ts handle()

Key takeaway: SvelteKit renders <svelte:head> server-side so bots see it in HTML; Svelte SPA does not. For per-route noai or server-side hard blocking, migrate to SvelteKit.

Hosting comparison

Hostrobots.txtX-Robots-TagHard 403
Cloudflare Pagespublic/robots.txt ✓_headers ✓_worker.js ✓
Vercelpublic/robots.txt ✓vercel.json ✓middleware.js ✓
Netlifypublic/robots.txt ✓netlify.toml ✓Edge Function ✓
GitHub Pagespublic/robots.txt ✓✗ (no header config)Cloudflare WAF proxy ✓
AWS S3 + CloudFrontpublic/robots.txt ✓CF Response Headers ✓CF Function ✓
nginx (self-hosted)public/robots.txt ✓add_header ✓map + return 403 ✓

FAQ

Does robots.txt block AI bots on a Svelte SPA?

robots.txt signals which crawlers to disallow, but most AI training bots ignore it. For guaranteed blocking you need hard server-level enforcement via nginx, a Cloudflare Pages Worker, or a Vercel Edge Function.

Does <svelte:head> work for blocking AI bots?

In a Svelte SPA, <svelte:head> tags are injected by JavaScript at runtime — non-JS crawlers never see them. Use index.html (Vite project root) instead: the noai meta tag there is in every HTML response before JavaScript runs.

What is the difference between Svelte and SvelteKit for AI bot blocking?

SvelteKit has server-side rendering and a hooks.server.ts handle() hook that can return a 403 before any content is sent. Svelte SPA (Vite) has no server layer — hard blocking requires nginx, a Cloudflare Pages _worker.js, or a Vercel Edge Function. SvelteKit <svelte:head> is rendered server-side so bots see it; Svelte SPA <svelte:head> is client-side only.

What is the best way to block AI bots on Cloudflare Pages with Svelte?

Add a _worker.js file to your public/ directory. Cloudflare Pages runs it as an edge Worker before serving static assets. Check the User-Agent header in the fetch handler and return a 403 Response for matched bots — exempt /robots.txt so it stays accessible.

Can I use a Service Worker to block AI bots in a Svelte SPA?

No. Service Workers only intercept requests made by browsers that have already loaded the Service Worker registration script. A bot fetching your page for the first time will not have a Service Worker installed and will bypass it entirely.

Do I need to change anything in vite.config.ts to serve robots.txt?

No. Vite copies public/ to dist/ verbatim as part of every build. No plugin or config change is required for robots.txt to be served at /robots.txt after deploy.

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides