What is AI discoverability?

AI discoverability is whether AI systems like ChatGPT, Perplexity, and Claude can access, understand, and cite your website. It depends on technical signals in your robots.txt, llms.txt, structured data, and page structure that are separate from traditional SEO.

How is this different from SEO?

SEO is about ranking in Google search. AI discoverability is about whether AI systems cite you in their answers. The two share almost nothing technically: different bots, different files, different signals, and different outcome metrics. A site can rank on page one of Google while being completely inaccessible to AI crawlers.

What is a training bot versus a retrieval bot?

Training bots like ClaudeBot and GPTBot build the underlying model's knowledge from your content over time. Retrieval bots like PerplexityBot power live AI search results and can surface your content in citations the same day they crawl it. Both need explicit per-bot access in robots.txt.

What is llms.txt and do I need it?

llms.txt is a file at your domain root that gives AI models a structured summary of your site: what it covers, who runs it, and which pages matter most. It is not required but helps AI systems understand your site without crawling every page. Beaconly checks whether yours exists and follows the expected format.

How long does an audit take?

Usually under 10 seconds. Beaconly fetches your robots.txt, llms.txt, llms-full.txt, sitemap, and page HTML in parallel, then analyzes 35 checks across three tiers and returns a full report with specific fixes.

Free AI discoverability audit

Your site might be invisible to AI
even if it ranks on Google.

AI crawlers follow different rules than search engines. They check your robots.txt bot permissions, llms.txt, structured data, and page signals before they index or cite anything. Beaconly runs 35 checks against the patterns AI systems actually use and shows you exactly what is missing.

Audit your site free

No signup. No credit card. Results in seconds.

AI is already the new search layer. Most sites are not configured for it.

3.6x more AI bot crawls than Googlebot in 2025

15.9% of ChatGPT users click cited sources

527% year-over-year growth in AI-referred traffic

45/100 average discoverability score on audited sites

The problem

Google bots and AI bots are not the same thing.

SEO is about ranking in search results. AI discoverability is about whether AI systems like ChatGPT and Perplexity can access, understand, and cite your content. The two use completely different files, different signals, and different bot identifiers.

AI crawlers require explicit per-bot permissions in robots.txt. A wildcard asterisk does not count as an allow.
AI systems depend on llms.txt, JSON-LD entity signals, and freshness markers that standard SEO tools do not check.
Retrieval bots like PerplexityBot convert at 111 crawls per referral. Training bots like ClaudeBot convert at 23,951. They need different strategies.

Layer 01

AI Crawler Access

15 checks

Checks your robots.txt for per-bot allow rules and validates your llms.txt structure and sitemap. Covers 14 AI crawlers by name, including GPTBot, ClaudeBot, and PerplexityBot.

Per-bot allow rules in robots.txt for each major AI crawler
llms.txt presence, structure, and link integrity
Sitemap discovery and llms-full.txt extended summary
No wildcard-only allows that AI bots ignore

GPTBot ClaudeBot PerplexityBot Google-Extended CCBot Bytespider Applebot-Extended Amazonbot cohere-ai llms.txt llms-full.txt sitemap.xml

Most sites block more than half the AI bots listed. If GPTBot or ClaudeBot cannot reach your content, it cannot be cited by ChatGPT or Claude regardless of content quality.

Layer 02

Schema and Structured Data

10 checks

Inspects your JSON-LD for the entity signals, freshness markers, and content types AI systems use when deciding what to cite. Checks @type, @id, sameAs, dateModified, speakable, and more.

Valid JSON-LD parse + recognized @type values per page
Entity identity via @id, sameAs, and Organization linking
Freshness signals: dateModified, datePublished, version
Speakable, FAQPage, and HowTo blocks AI systems prefer

@context @type @id sameAs dateModified datePublished Organization SoftwareApplication FAQPage speakable

Missing or malformed schema is one of the most common reasons sites are passed over during AI citation, even when their content is strong.

Layer 03

Page Structure

10 checks

Reviews your meta tags, heading structure, Open Graph data, canonical URL, HTTPS, and response speed. These signals tell AI systems whether a page is a reliable, well-structured source.

Title tag, meta description, and canonical URL on every page
Single H1 with logical H2/H3 outline AI parsers can follow
Open Graph and Twitter Card preview metadata
HTTPS, response time, and HTML weight under thresholds

title meta description canonical h1 outline og:title og:image twitter:card HTTPS response time HTML weight

AI systems prefer pages that are structurally clean, fast, and unambiguous. Poor page signals reduce citation confidence even when crawler access is fully configured.

Know your crawlers

Training bots and retrieval bots need different configurations.

Training

Training bots build model knowledge over time.

ClaudeBot, GPTBot, CCBot, Bytespider

23,951:1 crawl-to-referral ratio

These bots crawl your content to build the model's underlying knowledge. They do not produce immediate citations, but sites they cannot access will be underrepresented in the model's long-term understanding of your brand and domain. The access decision you make today shapes future model behavior.

Retrieval

Retrieval bots power live AI search citations.

PerplexityBot, Google-Extended

111:1 crawl-to-referral ratio

These bots power live AI search results and AI overviews. A single crawl can surface your content as a cited source in Perplexity or Google AI Overviews the same day. Blocking them makes your site invisible to AI-powered search in real time, not just over the long term.

Why Beaconly

Not just a score. Specific checks with specific fixes.

Checks across three tiers

Covers every signal AI systems use to discover and cite content: per-bot robots.txt rules, llms.txt structure, JSON-LD schema, Open Graph tags, heading structure, page speed, and more.

No account. No credit card.

Beaconly is a free tool published by Orygn. No signup required, built to be genuinely useful to anyone trying to understand AI discoverability.

Not flags. Actual fixes.

Every failed check includes the specific configuration, code snippet, or file change that makes it pass. Not generic recommendations that leave you guessing what to actually do.

Honest scoring. No wildcard credit.

A wildcard asterisk in robots.txt does not count as explicit AI crawler access. We check what AI bots actually require, not what SEO conventions suggest might be good enough.

FAQ

Common questions about AI discoverability.

AI discoverability is whether AI systems like ChatGPT, Perplexity, and Claude can access, understand, and cite your website. It depends on technical signals in your robots.txt, llms.txt, structured data, and page structure that are separate from traditional SEO.
SEO is about ranking in Google search. AI discoverability is about whether AI systems cite you in their answers. The two share almost nothing technically: different bots, different files, different signals, and different outcome metrics. A site can rank on page one of Google while being completely inaccessible to AI crawlers.
Training bots like ClaudeBot and GPTBot build the underlying model's knowledge from your content over time. Retrieval bots like PerplexityBot power live AI search results and can surface your content in citations the same day they crawl it. Both need explicit per-bot access in robots.txt, but the urgency is different.
llms.txt is a file at your domain root that gives AI models a structured summary of your site: what it covers, who runs it, and which pages matter most. It is not required but helps AI systems understand your site without crawling every page. Beaconly checks whether yours exists and follows the expected format.
Usually under 10 seconds. Beaconly fetches your robots.txt, llms.txt, llms-full.txt, sitemap, and page HTML in parallel, then analyzes 35 checks across three tiers and returns a full report with specific fixes.

Free audit tool

Find out if AI can
actually find your site.

35 checks across robots.txt, llms.txt, structured data, and page signals. No account. No credit card. Results in seconds.