Go has no built-in middleware framework — blocking is done by wrapping your http.Handler or using r.Use() in Gin. Compile the bot pattern once at startup with regexp.MustCompile — matching is O(n) per request. For robots.txt, use //go:embed (Go 1.16+) to bake the file into your binary — no runtime file path dependency.
| Method |
|---|
| robots.txt (static file or go:embed) Always — foundation layer |
| Dynamic /robots.txt handler Need staging vs production rules |
| net/http middleware wrapper Standard library net/http servers |
| Gin middleware (r.Use) Gin router |
| X-Robots-Tag response header Complement to robots.txt |
| nginx reverse proxy block nginx in front of Go server |
Go has no magic static asset handling — you serve robots.txt explicitly. The cleanest approach for deployed binaries is //go:embed, which bakes the file into the binary at compile time.
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: meta-externalagent Disallow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: xAI-Bot Disallow: / User-agent: DeepSeekBot Disallow: / User-agent: MistralBot Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / User-agent: YouBot Disallow: / User-agent: DuckAssistBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / User-agent: webzio-extended Disallow: / User-agent: gemini-deep-research Disallow: / User-agent: * Allow: /
The //go:embed directive compiles the file into the binary. No static/ directory needed at runtime — clean for Docker and serverless.
package main
import (
_ "embed"
"fmt"
"net/http"
)
//go:embed static/robots.txt
var robotsTxt string
func main() {
mux := http.NewServeMux()
// Serve embedded robots.txt
mux.HandleFunc("/robots.txt", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
fmt.Fprint(w, robotsTxt)
})
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "Hello, World!")
})
http.ListenAndServe(":8080", mux)
}mux := http.NewServeMux()
// Serve entire static/ directory including robots.txt
mux.Handle("/robots.txt", http.FileServer(http.Dir("static/")))
// Or serve just the file:
mux.HandleFunc("/robots.txt", func(w http.ResponseWriter, r *http.Request) {
http.ServeFile(w, r, "static/robots.txt")
})When you need different rules per environment — block everything in staging, block only AI bots in production — generate the response at runtime.
package main
import (
"fmt"
"net/http"
"os"
)
const aiBotsDisallow = `User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: *
Allow: /`
const blockAll = `User-agent: *
Disallow: /`
func robotsHandler(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
if os.Getenv("APP_ENV") == "production" {
fmt.Fprint(w, aiBotsDisallow)
} else {
fmt.Fprint(w, blockAll)
}
}
func main() {
mux := http.NewServeMux()
mux.HandleFunc("/robots.txt", robotsHandler)
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "Hello, World!")
})
http.ListenAndServe(":8080", mux)
}Go's idiomatic middleware pattern takes an http.Handler and returns an http.Handler. The outer handler intercepts the request — returning 403 for AI bots, or calling next.ServeHTTP(w, r) for everything else. Wrap your entire mux so all routes are protected.
package middleware
import (
"net/http"
"regexp"
)
// Compiled once at startup — zero allocation per request
var blockedUAs = regexp.MustCompile(
`(?i)GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|` +
`Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|` +
`Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|` +
`cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|` +
`webzio-extended|gemini-deep-research`,
)
// BlockAiBots returns a middleware that blocks known AI training crawlers.
// robots.txt is always allowed through so bots can read your opt-out.
func BlockAiBots(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Always serve robots.txt — let bots read the disallow rules
if r.URL.Path == "/robots.txt" {
next.ServeHTTP(w, r)
return
}
ua := r.Header.Get("User-Agent")
if ua != "" && blockedUAs.MatchString(ua) {
http.Error(w, "Forbidden", http.StatusForbidden)
return
}
next.ServeHTTP(w, r)
})
}package main
import (
_ "embed"
"fmt"
"net/http"
"yourmodule/middleware"
)
//go:embed static/robots.txt
var robotsTxt string
func main() {
mux := http.NewServeMux()
mux.HandleFunc("/robots.txt", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
fmt.Fprint(w, robotsTxt)
})
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprint(w, "Hello, World!")
})
// Wrap the entire mux — all routes are protected
handler := middleware.BlockAiBots(mux)
http.ListenAndServe(":8080", handler)
}regexp.MustCompile: Always use regexp.MustCompile at package level — it compiles the pattern once at startup and panics immediately on a bad pattern. Never call regexp.Compile inside the handler function; it re-compiles on every request.
Gin's middleware system uses r.Use() — functions registered here run before every route handler. Call c.AbortWithStatus(403) to stop the chain immediately. Register static file routes before r.Use(), or exempt/robots.txt inside the middleware.
package main
import (
_ "embed"
"net/http"
"regexp"
"github.com/gin-gonic/gin"
)
//go:embed static/robots.txt
var robotsTxt string
var blockedUAs = regexp.MustCompile(
`(?i)GPTBot|ChatGPT-User|OAI-SearchBot|ClaudeBot|anthropic-ai|` +
`Google-Extended|Bytespider|CCBot|PerplexityBot|meta-externalagent|` +
`Amazonbot|Applebot-Extended|xAI-Bot|DeepSeekBot|MistralBot|Diffbot|` +
`cohere-ai|AI2Bot|Ai2Bot-Dolma|YouBot|DuckAssistBot|omgili|omgilibot|` +
`webzio-extended|gemini-deep-research`,
)
func BlockAiBotsMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
// Always allow robots.txt through
if c.Request.URL.Path == "/robots.txt" {
c.Next()
return
}
ua := c.Request.UserAgent()
if ua != "" && blockedUAs.MatchString(ua) {
c.AbortWithStatus(http.StatusForbidden)
return
}
c.Next()
}
}
func main() {
r := gin.Default()
// robots.txt — registered before the blocking middleware
r.GET("/robots.txt", func(c *gin.Context) {
c.Data(http.StatusOK, "text/plain; charset=utf-8", []byte(robotsTxt))
})
// Apply AI bot blocking to all subsequent routes
r.Use(BlockAiBotsMiddleware())
r.GET("/", func(c *gin.Context) {
c.String(http.StatusOK, "Hello, World!")
})
r.Run(":8080")
}Route registration order: In Gin, routes registered before r.Use() are not wrapped by that middleware. Register GET /robots.txt first, then call r.Use(BlockAiBotsMiddleware()), then all your other routes. Alternatively, keep the c.Request.URL.Path == "/robots.txt" exemption inside the middleware — either approach works.
If you prefer serving from disk rather than go:embed:
// Serve robots.txt from disk
r.StaticFile("/robots.txt", "./static/robots.txt")
// Then apply middleware
r.Use(BlockAiBotsMiddleware())
// Then routes
r.GET("/api/v1/items", itemsHandler)Add X-Robots-Tag: noai, noimageai to all HTML responses via middleware. In Go, set it in your middleware chain before calling next.ServeHTTP.
func XRobotsTag(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("X-Robots-Tag", "noai, noimageai")
next.ServeHTTP(w, r)
})
}
// Chain middleware in main:
handler := XRobotsTag(BlockAiBots(mux))r.Use(func(c *gin.Context) {
c.Header("X-Robots-Tag", "noai, noimageai")
c.Next()
})For production, nginx typically sits in front of your Go server as a TLS terminator and reverse proxy. Add user-agent blocking at the nginx layer — the request never reaches your Go process.
map $http_user_agent $block_ai_bot {
default 0;
~*GPTBot 1;
~*ChatGPT-User 1;
~*OAI-SearchBot 1;
~*ClaudeBot 1;
~*anthropic-ai 1;
~*Google-Extended 1;
~*Bytespider 1;
~*CCBot 1;
~*PerplexityBot 1;
~*meta-externalagent 1;
~*Amazonbot 1;
~*Applebot-Extended 1;
~*xAI-Bot 1;
~*DeepSeekBot 1;
~*MistralBot 1;
~*Diffbot 1;
~*cohere-ai 1;
~*AI2Bot 1;
~*YouBot 1;
~*DuckAssistBot 1;
~*omgili 1;
~*webzio-extended 1;
~*gemini-deep-research 1;
}
server {
listen 443 ssl;
server_name yourapp.com;
ssl_certificate /etc/letsencrypt/live/yourapp.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/yourapp.com/privkey.pem;
# Always pass robots.txt through
location = /robots.txt {
proxy_pass http://127.0.0.1:8080;
}
location / {
if ($block_ai_bot) {
return 403 "Forbidden";
}
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}Cloud deployments: For Google Cloud Run, put Cloud Armor (WAF) in front of the Cloud Run service — custom rules can match the User-Agent header and return 403. For AWS ECS or Lambda, use AWS WAF with a custom rule targeting the User-Agent string. Both approaches block at the edge before your Go binary is invoked.
| Deployment |
|---|
| Linux VPS + nginx |
| Linux VPS (no nginx) |
| Docker container |
| Google Cloud Run |
| AWS Lambda (with adapter) |
| Kubernetes |
The recommended approach for deployed binaries is go:embed — add //go:embed static/robots.txt above a var robotsTxt string declaration and serve it from a handler with fmt.Fprint(w, robotsTxt). The file is compiled into the binary, so no static/ directory is needed at runtime. Alternatively, use http.ServeFile(w, r, "static/robots.txt") or r.StaticFile("/robots.txt", "./static/robots.txt") in Gin.
Wrap the entire mux: http.ListenAndServe(addr, BlockAiBots(mux)). For Gin: call r.Use(BlockAiBotsMiddleware()) after registering /robots.txt but before all other routes. Always exempt /robots.txt from the block so bots can read your disallow rules.
regexp.MustCompile compiles the pattern once at program startup and stores the result as a package-level var. Calling regexp.Compile (or regexp.MustCompile) inside your handler re-compiles the pattern on every request — a significant waste. Package-level MustCompile panics immediately if the pattern is invalid, making misconfiguration visible at startup rather than silently at runtime.
Yes. Go 1.16+ supports //go:embed for any file type. The pattern //go:embed static/robots.txt embeds the file content into a string or []byte variable at compile time. The binary is then self-contained — no need to ship a static/ directory alongside it. This is ideal for Docker images and serverless deployments where you want minimal filesystem dependencies.
Two options: (1) Register r.GET("/robots.txt", handler) before calling r.Use(BlockAiBotsMiddleware()) — Gin routes registered before Use() are not wrapped by that middleware. (2) Inside your middleware, add an early return for c.Request.URL.Path == "/robots.txt" before the user-agent check. Option 1 is cleaner; Option 2 is more explicit about intent.
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.