Skip to content

How to Block AI Bots in Go Hertz

Hertz is ByteDance's high-performance HTTP framework for Go, built on netpoll for epoll/kqueue-based I/O instead of net/http. Middleware signature: func(ctx context.Context, c *app.RequestContext). c.Request.Header.Get("User-Agent") returns an empty string when absent — never nil. c.AbortWithStatus(403) stops the handler chain, but code after Abort still executes — always return immediately after aborting. Patterns are similar to Gin but types are incompatible — Hertz uses its own protocol layer with zero-copy header parsing.

1. Bot detection

Pure Go, no dependencies. strings.ToLower for case-folding, strings.Contains for substring matching. Safe on empty strings without a nil-check.

// ai_bot_detector.go — AI bot detection, no external dependencies
package middleware

import "strings"

// aiBotPatterns contains known AI crawler User-Agent substrings.
// All lowercase — compared against a lowercased User-Agent.
var aiBotPatterns = []string{
	"gptbot",
	"chatgpt-user",
	"claudebot",
	"anthropic-ai",
	"ccbot",
	"google-extended",
	"cohere-ai",
	"meta-externalagent",
	"bytespider",
	"omgili",
	"diffbot",
	"imagesiftbot",
	"magpie-crawler",
	"amazonbot",
	"dataprovider",
	"netcraft",
}

// IsAIBot returns true if the User-Agent matches a known AI crawler.
//
// Hertz's ctx.Request.Header.Get() returns an empty string when the
// header is absent — the same behavior as net/http's Header.Get().
// strings.ToLower on "" returns "" so no nil-check is needed.
func IsAIBot(userAgent string) bool {
	if userAgent == "" {
		return false
	}
	lower := strings.ToLower(userAgent)
	for _, pattern := range aiBotPatterns {
		if strings.Contains(lower, pattern) {
			return true
		}
	}
	return false
}

2. Middleware and server setup

h.Use() registers global middleware. c.AbortWithStatus() prevents downstream handlers from running. Always return after aborting — Abort does not exit the function.

// main.go — Hertz server with AI bot blocking middleware
// Install: go get github.com/cloudwego/hertz

package main

import (
	"context"

	"github.com/cloudwego/hertz/pkg/app"
	"github.com/cloudwego/hertz/pkg/app/server"
	"github.com/cloudwego/hertz/pkg/common/utils"
	"github.com/cloudwego/hertz/pkg/protocol/consts"
	"yourmodule/middleware"
)

// ── Middleware ─────────────────────────────────────────────────────────────
//
// Hertz middleware signature: func(ctx context.Context, c *app.RequestContext)
//
// - c.Request.Header.Get("User-Agent") returns string ("" when absent)
// - c.AbortWithStatus(403) stops the handler chain — remaining handlers skipped
// - c.Next(ctx) passes to the next handler in the chain
// - c.Header("X-Robots-Tag", "noai, noimageai") sets a response header
//
// IMPORTANT: c.Abort() does NOT return from the function — code after
// c.AbortWithStatus() still executes. Always "return" after aborting.

func AiBotBlocker() app.HandlerFunc {
	return func(ctx context.Context, c *app.RequestContext) {
		path := string(c.Request.URI().Path())

		// Always allow robots.txt so crawlers discover Disallow rules.
		if path == "/robots.txt" {
			c.Next(ctx)
			return
		}

		// Get User-Agent — returns "" when absent, never nil.
		// Hertz uses []byte internally but Get() returns string.
		ua := c.Request.Header.Get("User-Agent")

		if middleware.IsAIBot(ua) {
			// Block: set X-Robots-Tag, abort with 403, then return.
			c.Header("Content-Type", "text/plain")
			c.Header("X-Robots-Tag", "noai, noimageai")
			c.AbortWithStatus(consts.StatusForbidden)
			return // MUST return — code after Abort still executes
		}

		// Pass: set X-Robots-Tag on the response, then continue chain.
		// Headers set before c.Next() appear on the response regardless
		// of what downstream handlers do (unless they overwrite them).
		c.Header("X-Robots-Tag", "noai, noimageai")
		c.Next(ctx)
	}
}

// ── Handlers ──────────────────────────────────────────────────────────────

const robotsTxt = `User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /
`

func main() {
	h := server.Default(server.WithHostPorts("0.0.0.0:8080"))

	// Global middleware — applied to every route.
	h.Use(AiBotBlocker())

	h.GET("/robots.txt", func(ctx context.Context, c *app.RequestContext) {
		c.Header("Content-Type", "text/plain")
		c.String(consts.StatusOK, robotsTxt)
	})

	h.GET("/", func(ctx context.Context, c *app.RequestContext) {
		c.JSON(consts.StatusOK, utils.H{"message": "ok"})
	})

	h.Spin()
}

3. Route-group middleware

Use h.Group() to scope middleware to a path prefix. Only routes registered on the group get bot blocking — public routes remain unaffected.

// Route-group middleware — protect only /api/* routes.
//
// Hertz route groups work like Gin: group.Use() applies middleware
// only to routes registered on that group.

func main() {
	h := server.Default(server.WithHostPorts("0.0.0.0:8080"))

	// Public routes — no bot blocking
	h.GET("/", indexHandler)
	h.GET("/about", aboutHandler)
	h.GET("/robots.txt", robotsHandler)

	// API routes — bot blocking applied
	api := h.Group("/api")
	api.Use(AiBotBlocker())  // only /api/* gets blocked
	{
		api.GET("/data", dataHandler)
		api.GET("/users", usersHandler)
	}

	h.Spin()
}

4. JSON error response

c.AbortWithStatusJSON() sets the status code and marshals a JSON body in one call — useful for API endpoints that should return structured error responses.

// Abort with a JSON error body instead of empty 403.
//
// c.AbortWithMsg() sets both status and body.
// c.AbortWithStatusJSON() sets status + JSON body (convenience).

func AiBotBlockerJSON() app.HandlerFunc {
	return func(ctx context.Context, c *app.RequestContext) {
		ua := c.Request.Header.Get("User-Agent")

		if middleware.IsAIBot(ua) {
			c.Header("X-Robots-Tag", "noai, noimageai")
			// AbortWithStatusJSON marshals the body as JSON automatically.
			c.AbortWithStatusJSON(consts.StatusForbidden, utils.H{
				"error":   "forbidden",
				"message": "AI crawlers are not permitted",
			})
			return
		}

		c.Header("X-Robots-Tag", "noai, noimageai")
		c.Next(ctx)
	}
}

Key points

Framework comparison — Go HTTP middleware models

FrameworkProtocol layerBlock requestHeader access
Hertznetpoll (own protocol)c.AbortWithStatus(403)c.Request.Header.Get()""
Ginnet/httpc.AbortWithStatus(403)c.GetHeader()""
Fiberfasthttp (own protocol)c.SendStatus(403) + returnc.Get("User-Agent") ""
Echonet/httpReturn echo.NewHTTPError(403)c.Request().Header.Get() ""

Hertz and Fiber both bypass net/http for performance (netpoll and fasthttp respectively), trading standard library compatibility for throughput. Gin and Echo use net/http and are compatible with any http.Handler middleware. For bot detection (pure string matching, no I/O), the performance difference is negligible — the choice depends on your existing stack.