Check whether your robots.txt lets the AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot and more — read your site. Decide whether you want to appear in AI answers or keep your content out of training.
⚡ Interactive demo — sample data
3 of 12 AI crawlers are blocked in this sample robots.txt — training bots blocked, search/answer bots allowed.
Check whether your robots.txt lets the AI crawlers — GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot and more — read your site. Decide whether you want to appear in AI answers or keep your content out of training.
How it works
Enter your website URL or domain
Paste any domain and run the check. We fetch your live robots.txt from the root of the site (e.g. https://example.com/robots.txt) — the same file every AI crawler reads before it decides what it's allowed to download.
See which AI bots are allowed or blocked
We parse the User-agent and Disallow rules and check each known AI crawler — GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Google-Extended, PerplexityBot, CCBot, Bytespider, Amazonbot, Applebot-Extended and meta-externalagent — against the rule that applies to it. A bot-specific group wins over the catch-all User-agent: *, exactly like the real crawlers resolve it.
Decide your AI strategy and update robots.txt
Each bot is marked Allowed or Blocked with a note on who runs it. Use that to make a deliberate choice — appear in AI answers, or keep your content out of AI training — then edit robots.txt and re-run to confirm the rules land the way you intended.
What we check
robots.txt exists and is reachable — Confirms a robots.txt is actually served at the domain root. If there's no file (or it's empty), every AI crawler is allowed by default — robots.txt only restricts; it never opens up access that wasn't already open.
Per-bot allow/block status — Resolves the effective rule for each of the AI crawlers we track. We honor crawler precedence: a User-agent group naming the specific bot (or a substring of its name) overrides the wildcard User-agent: * group, which is how Google, OpenAI and the others actually apply your file.
Disallow: / detection — Flags a full block — a Disallow: / line in the group that matches a bot means that crawler is shut out of the entire site. We report that as Blocked so you can see at a glance who you've excluded.
Which company runs each bot — Labels every crawler with its operator and purpose — OpenAI (ChatGPT training, ChatGPT search, live browsing), Anthropic (Claude), Google (Gemini / AI training), Perplexity, Common Crawl, ByteDance, Amazon, Apple and Meta — so you know exactly what you're allowing or blocking.
Sitemap directive — Checks for a Sitemap: line in robots.txt. It's unrelated to AI access, but it's a free, easy win for normal crawling that's often missing — so we surface it while we're in the file.
Strategy guidance — Reminds you that allowing AI bots helps you show up in AI answers, while blocking protects content from training — and that neither choice carries an SEO ranking penalty. It's a content-strategy decision, not a ranking lever.
Common issues we catch
Blocking GPTBot but assuming you blocked ChatGPT search — GPTBot is OpenAI's training crawler. OAI-SearchBot indexes content for citations inside ChatGPT search, and ChatGPT-User fetches a page only when a person asks ChatGPT to read it. They're three separate tokens — blocking GPTBot does nothing to the other two. OpenAI explicitly supports allowing OAI-SearchBot while disallowing GPTBot.
Thinking Google-Extended affects Google Search rankings — Google-Extended only controls whether your content trains Gemini and powers AI features. It does NOT change how Googlebot crawls or indexes you for normal Search. You can block Google-Extended and stay fully indexed in Google's web results — they're governed by different tokens.
Assuming robots.txt actually enforces the block — robots.txt is advisory. Well-behaved crawlers obey it, but it's a request, not a firewall — a bot can ignore it. Some AI agents (e.g. browser-style agents using a normal Chrome user-agent) carry no identifying token at all and can't be controlled from robots.txt. For real enforcement you need server-side or WAF blocking.
A wildcard Disallow: / accidentally locking out AI bots — A site under construction or a misconfigured staging rule often ships User-agent: * / Disallow: /. That blocks every AI crawler too, even though no one named them — they just inherit the wildcard group. If you want AI visibility, that single line is silently undoing it.
No robots.txt at all — everything is open — If the file is missing, people often assume the site is 'protected.' The opposite is true: with no robots.txt, every AI crawler is free to read the site. That may be exactly what you want for AI visibility, but it should be a decision, not an accident.
Using the wrong token name — Rules only work when the User-agent string matches what the crawler announces. Outdated or guessed names (CommonCrawl instead of CCBot, AnthropicBot instead of ClaudeBot or anthropic-ai) match nothing, so the bot you meant to block walks right in under the wildcard rule.
Robots rule in the wrong group order — A bot reads only the most specific User-agent group that names it and ignores the rest. If you put a Disallow under User-agent: * but the bot also has its own (more permissive) group higher up, the wildcard rule never applies to it — a frequent reason a 'blocked' bot is still allowed.
Where this matters
OpenAI — GPTBot, OAI-SearchBot, ChatGPT-User — Three independent tokens: GPTBot (model training), OAI-SearchBot (ChatGPT search citations) and ChatGPT-User (a human-initiated page fetch). Control them separately to get ChatGPT search presence without contributing to training, or vice versa.
Anthropic — ClaudeBot & anthropic-ai — ClaudeBot is Anthropic's current crawler for Claude; anthropic-ai is the legacy token still seen in older robots.txt files. We check both so a rule that only names one doesn't give you a false sense of coverage.
Google — Google-Extended & Perplexity — Google-Extended gates Gemini and AI-feature training, fully decoupled from Googlebot's Search indexing. PerplexityBot crawls for the Perplexity answer engine — both return traffic and citations, so many sites deliberately leave them allowed.
Common Crawl, ByteDance, Amazon, Apple & Meta — CCBot (Common Crawl) feeds datasets used to train many models, so blocking it has outsized reach. We also check Bytespider (ByteDance/TikTok), Amazonbot (Amazon AI), Applebot-Extended (Apple AI training) and meta-externalagent (Meta AI training).
Frequently asked questions
Will blocking AI bots hurt my Google rankings?
No. The AI-training tokens are separate from the crawler that indexes you for Search. Blocking GPTBot, ClaudeBot, CCBot or Google-Extended has no effect on your normal Google Search rankings. It only changes whether your content can be used by AI systems — which is a content-strategy choice, not an SEO penalty.
Should I block AI crawlers or allow them?
It depends on your goal. Allowing them helps your brand appear in ChatGPT, Gemini, Perplexity and other AI answers — increasingly a real source of referral traffic. Blocking them keeps your content out of AI training and answer engines. Many sites split the difference: allow the search/answer bots that send traffic, block the pure training crawlers.
Does Google-Extended stop Google from indexing my site?
No. Google-Extended only governs whether your content trains Gemini and powers Google's AI features. Googlebot — the crawler behind Google Search — is controlled by a different token entirely. You can block Google-Extended and remain fully indexed in regular Google results.
Is robots.txt enough to actually keep AI out of my content?
Not entirely. robots.txt is the Robots Exclusion Protocol — a request that well-behaved crawlers honor voluntarily. Reputable bots like GPTBot and ClaudeBot obey it, but it isn't enforced. A crawler can ignore it, and some AI agents use a generic browser user-agent with no identifying token at all. For hard enforcement, block at the server or WAF level.
What's the difference between GPTBot and ChatGPT-User?
GPTBot is the automated crawler that gathers content which may be used to train OpenAI's models. ChatGPT-User only fetches a specific page when a person explicitly asks ChatGPT to visit it — it's not bulk crawling or training. OAI-SearchBot is a third token that indexes pages for citations in ChatGPT search.
How do I block a specific AI bot in robots.txt?
Add a group naming the bot's exact user-agent token followed by a full disallow — for example: User-agent: GPTBot then Disallow: /. To allow it everywhere, use Disallow: (empty) or simply leave the bot out of any blocking group. The token name must match exactly what the crawler announces or the rule won't apply.
How long until my robots.txt change takes effect?
Crawlers re-fetch robots.txt periodically rather than on every request, so a change isn't instant. As a rough guide, OpenAI notes it can take around 24 hours for its search systems to pick up a robots.txt update. Other crawlers vary, but most reflect changes within a day or two.
Why does the checker say a bot is allowed when I tried to block it?
The usual causes are a mismatched token name, a more specific (and permissive) group for that bot higher in the file overriding your wildcard rule, or the disallow not being a full Disallow: / for the whole site. Each crawler obeys only the single most specific User-agent group that names it — fix the token or the group and re-run.
This is one of several free SEO tools from Custom Web Audits.
For a complete, prioritized analysis of your whole website,
run a full audit.