How to Block AI2Bot: Allen Institute's Two AI Crawlers Explained
AI2 operates two separate crawlers: AI2Bot for academic research indexing and Ai2Bot-Dolma for building the open-source Dolma training dataset. Different purposes — different blocking decisions.
Updated March 2026
Two Bots, Two Different Decisions
The Allen Institute for AI (AI2) is a nonprofit research lab in Seattle. Unlike commercial AI companies, AI2's work is primarily open-source and academic. Their two crawlers serve different purposes:
AI2BotGeneral research crawler. Indexes content for Semantic Scholar and other academic tools.Ai2Bot-DolmaTraining data crawler. Collects content for the Dolma dataset, used to train OLMo models.What Does AI2Bot Do?
AI2Bot is the Allen Institute's general web crawler. Its primary purpose is feeding Semantic Scholar — a free, AI-powered academic search engine that indexes over 200 million scholarly papers and their web-based references. If you publish academic or research content, Semantic Scholar may index it via AI2Bot.
AI2Bot also supports other AI2 research initiatives. Unlike commercial crawlers, AI2Bot's output primarily benefits the academic community rather than generating revenue.
What Does Ai2Bot-Dolma Do?
Ai2Bot-Dolma is a specialized crawler built to collect data for the Dolma dataset — a massive open-source pretraining corpus containing approximately 3 trillion tokens. Dolma sources data from web pages (via Common Crawl and direct crawling), code repositories, academic papers, and encyclopedic content.
Dolma was used to train the OLMo (Open Language Model) family — AI2's openly-released, fully-documented language models. Because both the dataset and the models are open-source, your content could influence not just OLMo, but any downstream model fine-tuned from it.
How to Block AI2Bot and Ai2Bot-Dolma
You can block each crawler independently. Here's the recommended configuration:
# Block AI training data collection User-agent: Ai2Bot-Dolma Disallow: / # Allow academic research indexing (Semantic Scholar) User-agent: AI2Bot Allow: /
User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: /
Note: Case sensitivity matters
The user agent tokens are case-sensitive in some implementations. Use AI2Bot (capital A, I, 2, capital B) and Ai2Bot-Dolma (capital A, lowercase i, 2, capital B, hyphen, capital D) exactly as shown.
The Academic Research Nuance
AI2Bot is unique among AI crawlers because it serves a genuine academic purpose. Blocking it has different implications depending on your content:
What Blocking Does (and Doesn't) Do
- • AI2Bot: Your content appearing in Semantic Scholar
- • Ai2Bot-Dolma: Your content entering the Dolma dataset
- • Future OLMo model training on your content
- • Downstream models built on Dolma/OLMo using your data
- • Content already collected for Dolma
- • Common Crawl data (Dolma's primary source) — block CCBot separately
- • Other AI crawlers (GPTBot, ClaudeBot, etc.)
- • Google or Bing rankings (completely unaffected)
Frequently Asked Questions
Is AI2 the same as Allen AI?
Yes. The Allen Institute for AI (AI2) is commonly referred to as "Allen AI." It was founded in 2014 by Paul Allen (co-founder of Microsoft) and is headquartered in Seattle. It's a nonprofit research institute focused on AI research for the common good.
Does blocking Ai2Bot-Dolma actually help if Dolma already has my data?
Blocking prevents future crawls from adding new content. However, the initial Dolma dataset was built primarily from Common Crawl archives, which are publicly available. If your content was in Common Crawl, it may already be in Dolma regardless of your robots.txt for Ai2Bot-Dolma. Block CCBot to prevent future Common Crawl inclusion as well.
Should I treat AI2Bot differently from commercial crawlers?
That's a philosophical decision. AI2 is a nonprofit doing open-source research, which some publishers view differently from commercial AI labs monetizing their content. Others apply a blanket policy: no AI crawling is allowed regardless of the operator's mission. Both positions are valid.
Will blocking affect my SEO?
No. AI2Bot and Ai2Bot-Dolma are completely separate from Google, Bing, and all traditional search engines. Blocking has zero effect on your search rankings.
Related Guides
Is your site protected from AI bots?
Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.