How is this different from Google Analytics?

Google Analytics shows you traffic. Shadow shows you traffic, AI bot activity, what AI platforms say about your brand, AND tells you what to do about all of it. It's analytics + AI intelligence + action steps in one tool.

Do I need to install anything?

For basic monitoring (bot detection, AI perception, readiness score) — nope, just enter your URL. For full visitor analytics (clicks, behavior, sessions), add one script tag. One-click integrations for Vercel, Shopify, WordPress, and more.

Will it slow down my site?

No. The script is under 5KB and loads async. Zero impact on page speed or Core Web Vitals. External monitoring has literally no impact — it watches from the outside.

What AI bots does Shadow detect?

All of them. GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, Amazonbot, and dozens more. The Shadow Network means new bots get identified across all users instantly.

What do you mean by "actionable steps"?

Shadow doesn't just show you graphs. It says things like: "ChatGPT has your pricing wrong — add structured data to /pricing to fix it" or "Your bounce rate on /features is 68% — here's why and what to change." Specific, do-it-today recommendations.

Can Shadow block bots?

Shadow is a telescope, not a shield. It shows you who's visiting and what AI says about you. It generates block rules and robots.txt configs you can apply — but it doesn't intercept traffic.

Yes. Shadow never collects PII. IP addresses are hashed after classification. No cookies on your visitors. All Shadow Network data is anonymized. GDPR compliant by design.

What is AI2Bot and what does it do?

AI2Bot is the general web crawler operated by the Allen Institute for AI (AI2). It collects web content to support AI2's academic research initiatives, including tools like Semantic Scholar, a free academic search engine. Unlike commercial training crawlers like GPTBot or ClaudeBot, AI2Bot's purpose is academic research and open-source AI development, not building a proprietary commercial product.

What is Ai2Bot-Dolma and how is it different from AI2Bot?

Ai2Bot-Dolma is a specialized crawler AI2 used specifically to build the Dolma dataset — a massive, open-source pretraining corpus containing approximately 3 trillion tokens from diverse web sources. Dolma was used to train the OLMo (Open Language Model) family, AI2's openly-released language models. While AI2Bot supports ongoing research indexing, Ai2Bot-Dolma was purpose-built for a specific open-source dataset collection effort. Both have separate user agent tokens and may appear independently in your server logs.

Do AI2Bot and Ai2Bot-Dolma respect robots.txt?

Yes. The Allen Institute for AI is a respected nonprofit AI research organization based in Seattle, US. Both AI2Bot and Ai2Bot-Dolma respect robots.txt Disallow directives. AI2 has publicly documented their crawlers and their robots.txt compliance, which is consistent with their academic mission and transparency values.

Should I block AI2Bot versus Ai2Bot-Dolma differently?

Yes — they're worth treating as separate decisions. AI2Bot primarily supports academic research and tools like Semantic Scholar. If you publish academic content, having it indexed by Semantic Scholar may benefit you. Ai2Bot-Dolma was used to build a training dataset (Dolma) for language models (OLMo). If your concern is AI model training, Ai2Bot-Dolma is the one to block. You can block Ai2Bot-Dolma specifically while continuing to allow AI2Bot.

What is Dolma and which AI models was it used to train?

Dolma is an open-source pretraining dataset released by the Allen Institute for AI in 2023–2024. It contains approximately 3 trillion tokens sourced from web pages (primarily via Common Crawl), code repositories, academic papers, encyclopedic content, and other public sources. Dolma was used to train the OLMo (Open Language Model) family — AI2's openly-released, fully-documented language models. Because Dolma is open-source, its training data also influences downstream models built on OLMo or fine-tuned from it.

What are the user agent tokens for AI2Bot and Ai2Bot-Dolma?

AI2Bot uses the user agent token: AI2Bot (capital A, I, 2, capital B). Ai2Bot-Dolma uses: Ai2Bot-Dolma (capital A, lowercase i, 2, capital B, hyphen, capital D). Both tokens are case-sensitive in some implementations — use the exact casing shown. In robots.txt: User-agent: AI2Bot followed by Disallow: /, and separately User-agent: Ai2Bot-Dolma followed by Disallow: /.

Will blocking AI2Bot affect my SEO or search rankings?

No. AI2Bot and Ai2Bot-Dolma are completely separate from Google, Bing, or any traditional search engine crawler. Blocking either bot has zero effect on your rankings in Google Search, Bing, or any other traditional search engine. The only effect is on your presence in AI2's research tools (like Semantic Scholar) and in future models trained on Dolma or related datasets.

Allen Institute for AIRespects robots.txtResearch + Training

How to Block AI2Bot: Allen Institute's Two AI Crawlers Explained

AI2 operates two separate crawlers: AI2Bot for academic research indexing and Ai2Bot-Dolma for building the open-source Dolma training dataset. Different purposes — different blocking decisions.

Updated March 2026

Two Bots, Two Different Decisions

The Allen Institute for AI (AI2) is a nonprofit research lab in Seattle. Unlike commercial AI companies, AI2's work is primarily open-source and academic. Their two crawlers serve different purposes:

AI2BotGeneral research crawler. Indexes content for Semantic Scholar and other academic tools.

Ai2Bot-DolmaTraining data crawler. Collects content for the Dolma dataset, used to train OLMo models.

What Does AI2Bot Do?

AI2Bot is the Allen Institute's general web crawler. Its primary purpose is feeding Semantic Scholar — a free, AI-powered academic search engine that indexes over 200 million scholarly papers and their web-based references. If you publish academic or research content, Semantic Scholar may index it via AI2Bot.

AI2Bot also supports other AI2 research initiatives. Unlike commercial crawlers, AI2Bot's output primarily benefits the academic community rather than generating revenue.

What Does Ai2Bot-Dolma Do?

Ai2Bot-Dolma is a specialized crawler built to collect data for the Dolma dataset — a massive open-source pretraining corpus containing approximately 3 trillion tokens. Dolma sources data from web pages (via Common Crawl and direct crawling), code repositories, academic papers, and encyclopedic content.

Dolma was used to train the OLMo (Open Language Model) family — AI2's openly-released, fully-documented language models. Because both the dataset and the models are open-source, your content could influence not just OLMo, but any downstream model fine-tuned from it.

How to Block AI2Bot and Ai2Bot-Dolma

You can block each crawler independently. Here's the recommended configuration:

robots.txtBlock training only, allow research

# Block AI training data collection
User-agent: Ai2Bot-Dolma
Disallow: /

# Allow academic research indexing (Semantic Scholar)
User-agent: AI2Bot
Allow: /

robots.txtBlock both AI2 crawlers

User-agent: AI2Bot
Disallow: /

User-agent: Ai2Bot-Dolma
Disallow: /

Note: Case sensitivity matters

The user agent tokens are case-sensitive in some implementations. Use AI2Bot (capital A, I, 2, capital B) and Ai2Bot-Dolma (capital A, lowercase i, 2, capital B, hyphen, capital D) exactly as shown.

The Academic Research Nuance

AI2Bot is unique among AI crawlers because it serves a genuine academic purpose. Blocking it has different implications depending on your content:

🎓

Academic publishers & researchers

Allowing AI2Bot means your papers and research may appear in Semantic Scholar, increasing discoverability and citations. Most researchers benefit from this.

📰

News and media sites

AI2Bot indexing your journalism for academic research tools is relatively low-risk compared to training crawlers. The content typically appears as a citation, not a full reproduction.

🔒

Paywalled content providers

Even academic indexing may surface content summaries. If your paywall is your business model, blocking both crawlers is the conservative choice.

What Blocking Does (and Doesn't) Do

What it stops

• AI2Bot: Your content appearing in Semantic Scholar
• Ai2Bot-Dolma: Your content entering the Dolma dataset
• Future OLMo model training on your content
• Downstream models built on Dolma/OLMo using your data

What it doesn't stop

• Content already collected for Dolma
• Common Crawl data (Dolma's primary source) — block CCBot separately
• Other AI crawlers (GPTBot, ClaudeBot, etc.)
• Google or Bing rankings (completely unaffected)

Frequently Asked Questions

Is AI2 the same as Allen AI?

Yes. The Allen Institute for AI (AI2) is commonly referred to as "Allen AI." It was founded in 2014 by Paul Allen (co-founder of Microsoft) and is headquartered in Seattle. It's a nonprofit research institute focused on AI research for the common good.

Does blocking Ai2Bot-Dolma actually help if Dolma already has my data?

Blocking prevents future crawls from adding new content. However, the initial Dolma dataset was built primarily from Common Crawl archives, which are publicly available. If your content was in Common Crawl, it may already be in Dolma regardless of your robots.txt for Ai2Bot-Dolma. Block CCBot to prevent future Common Crawl inclusion as well.

Should I treat AI2Bot differently from commercial crawlers?

That's a philosophical decision. AI2 is a nonprofit doing open-source research, which some publishers view differently from commercial AI labs monetizing their content. Others apply a blanket policy: no AI crawling is allowed regardless of the operator's mission. Both positions are valid.

Will blocking affect my SEO?

No. AI2Bot and Ai2Bot-Dolma are completely separate from Google, Bing, and all traditional search engines. Blocking has zero effect on your search rankings.

Related Guides

How to Block CCBot

Common Crawl feeds AI2's Dolma dataset

How to Block Cohere

Another research-focused AI crawler

robots.txt for AI Bots (Complete Guide)

51+ crawlers, full reference table

How to Block GPTBot

OpenAI's training crawler

Is your site protected from AI bots?

Run a free scan to check your robots.txt, meta tags, and overall AI readiness score.