About
img2dataset is an open-source tool used to download and resize large image datasets for AI training. Widely used by the ML community to build CLIP training sets and image-generation datasets. When run against a site, it downloads images in bulk without standard browser identification.
Purpose
Bulk image dataset collection for AI training
User Agent String
img2dataset
How to Control in robots.txt
🚫 Block img2dataset
User-agent: img2dataset Disallow: /
✅ Allow img2dataset
User-agent: img2dataset Allow: /
⚠️ img2dataset has been observed ignoring robots.txt directives. You may need server-level blocking (e.g., firewall rules or user-agent filtering) to effectively prevent access.
Is img2dataset crawling your site?
Enter your URL below — scan takes under 5 seconds.
Free · No signup · Instant results