Skip to content
CohereUndocumentedLikely Training

How to Block cohere-ai: Cohere's Undocumented Web Crawler

cohere-ai crawls your site without any official documentation explaining what it collects or why. It's operated by Cohere — the enterprise AI lab behind Command R. Only ~13% of major websites block it.

Updated March 2026

Why "Undocumented" Matters

Most major AI companies publish documentation explaining their crawlers. OpenAI documents GPTBot, Anthropic documents ClaudeBot, Google documents Google-Extended. Cohere has published no official documentation for cohere-ai — no help page, no blog post, no developer docs explaining what it does or how it uses collected data.

This lack of transparency means publishers must infer the crawler's purpose from Cohere's business model and observed behavior. When in doubt, blocking is the conservative choice.

What We Know About cohere-ai

Cohere is a Canadian-American AI company founded in 2019, focused on enterprise AI. Its products include Command R and Command R+ (retrieval-augmented generation models), the Aya multilingual model family, and Embed (embedding models for semantic search). Cohere's customers are primarily enterprises — banks, healthcare companies, and tech firms.

The cohere-ai crawler has been identified through server log analysis by security researchers and bot tracking services. Based on Cohere's business (building language and embedding models), the crawler likely serves one or both of these purposes:

Training data collectionCrawling web content to train or fine-tune Cohere's language models (Command R, Aya)
Live retrieval / RAGFetching web pages in real-time when Cohere's AI products need web context for answers

The user agent string is: Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)

How to Block cohere-ai

Add this to your robots.txt:

robots.txtBlock cohere-ai
User-agent: cohere-ai
Disallow: /

Because cohere-ai is undocumented, consider adding server-level enforcement:

nginxBlock by user agent
if ($http_user_agent ~* "cohere-ai") {
    return 403;
}
Cloudflare WAFCustom rule
Field: User Agent
Operator: contains
Value: cohere-ai
Action: Block

Why Only 13% of Sites Block cohere-ai

The low blocking rate isn't because cohere-ai is safe — it's because most publishers don't know it exists.

📡
No media coverage
GPTBot and ClaudeBot launched with press releases and blog posts. cohere-ai was discovered through server log analysis — no announcement, no documentation, no media coverage.
🏢
Enterprise focus obscures awareness
Cohere is primarily B2B. It doesn't have a consumer-facing AI product like ChatGPT or Claude, so publishers don't encounter it as a product they need to worry about.
📋
Not in standard block lists
Many robots.txt templates for AI blocking focus on the well-known crawlers. cohere-ai is often missing from popular "block all AI bots" templates and guides.

What Blocking Does (and Doesn't) Do

What it stops
  • • Cohere from crawling your content going forward
  • • New content from entering Cohere's training pipeline
  • • Live retrieval of your pages for Cohere's AI products
What it doesn't stop
  • • Content Cohere has already crawled
  • • Other AI crawlers (GPTBot, ClaudeBot, etc.)
  • • Cohere accessing your content via Common Crawl or data brokers
  • • Google or Bing rankings (unaffected)

Frequently Asked Questions

Does cohere-ai respect robots.txt?

Based on available evidence, it appears to. Cohere is a US-based, venture-backed company with major enterprise customers (including banks and healthcare firms) that expect compliance. However, because the crawler is undocumented, this cannot be officially confirmed. For guaranteed enforcement, add server-level blocking.

Is Cohere different from OpenAI and Anthropic?

Yes. Cohere is primarily B2B — it sells AI infrastructure to enterprises for internal use cases (document search, summarization, customer support). It doesn't have a major consumer product like ChatGPT or Claude. This enterprise focus means your content may end up powering internal corporate AI tools rather than a public chatbot.

Does blocking cohere-ai affect my SEO?

No. cohere-ai has no relationship with Google, Bing, or any search engine. Blocking it has zero effect on your search rankings or visibility.

Should I block cohere-ai if I already block GPTBot and ClaudeBot?

If your policy is to block AI training crawlers, then yes. cohere-ai likely serves a similar training purpose. The lack of documentation makes it a higher-risk crawler to leave unblocked — you don't know exactly what it's doing with your content.

Related Guides

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.

Related Guides