GolOps
Back to Blog
ResearchLLMs

The llms.txt Effect: 37,894 Domains, Zero Citation Advantage

GolOps research — 37,894 AI-cited domains scanned for llms.txt. 13.3% have the file. The citation advantage is zero. Mann-Whitney U p=0.85. A popular myth, checked against data.

GolOps TeamGolOps Team
The llms.txt Effect: 37,894 Domains, Zero Citation Advantage
The llms.txt Effect: 37,894 Domains, Zero Citation Advantage
GolOps Lab

A narrative has formed around the llms.txt file: drop a text file in your site root and AI systems will start citing you more often. It gets sold as the one trick that moves AI visibility. The data does not show that.

GolOps scanned 37,894 domains that AI systems actually cite in their answers. 5,035 of them (13.3%) have an llms.txt file. The citation advantage for that group is a statistical zero. Mann-Whitney U gives p=0.85 — about as far from significance as a result can get.

MetricValue
Domains scanned37,894
Citations analyzed337,000+
Brand snapshots882
Domains with llms.txt13.3%
Mann-Whitney Up=0.85

Corpus: domains with two or more appearances in AI responses

Key findings

13.3% — the share with llms.txt. Across all AI-cited domains, one in eight has the file. The ones who add it are betting on the future; they get no advantage today.

6.8 vs 6.7 — average citations. Domains with llms.txt average 6.8 citations, those without 6.7. The difference is indistinguishable from noise. The median in both groups is exactly 3.0.

6% — adoption in the top 50. Among the fifty most-cited domains, only 6% have the file. The higher a domain sits in the citation ranking, the less likely it is to run llms.txt. The standard is adopted by sites hoping for visibility, not by sites that already have it.

24.1% vs 0% — the category skew. Adoption is led by SaaS and developer tools — exactly the community that proposed the standard. References and review platforms sit at zero. And those are the categories with the highest citation rates.

The adoption curve by citation tier

Rank domains by AI citations and read adoption across tiers, and an inverse pattern shows up:

Citation tierShare with llms.txt
Top 506.0%
Top 1007.0%
Top 25013.6%
Top 50014.4%
Top 100015.3%
Top 250015.9%
Top 500016.1%
Top 1000015.7%
Top 2500013.7%
Full sample (37,894)13.3%

The most-cited domains in AI do not run the file. As you move down the ranking, adoption first rises, then settles back. If llms.txt were a visibility lever, the curve would run the other way: the top would be saturated with the file. It is empty.

So if the most-cited domains do not use llms.txt, what actually drives AI citations? The data points to domain authority, content depth, and presence in training data, not a file in the site root.

The verdict: does the file help?

A direct comparison of the two groups closes the question.

MeasureWith llms.txtWithout llms.txt
Average citations per domain6.86.7
Median citations3.03.0
Mann-Whitney Up=0.85

The averages differ by a tenth of a citation — across 37,894 domains, indistinguishable from noise. The medians match exactly: both groups land on 3.0.

On the full sample the test becomes technically significant (p<0.001), but purely because of sample size. The effect size is r=−0.065, below the 0.1 threshold for even a "small" effect. That is statistical significance without practical significance. Having llms.txt gives no measurable advantage in AI citation frequency. Whatever drives source selection in AI answers, it is not llms.txt. An independent analysis by SE Ranking reached the same conclusion across 300,000 domains: the file showed up on 10.13% of them and had no measurable link to AI citation frequency.

Who adopts the file: the tech echo chamber

Break adoption down by domain category and you can see who is driving the standard:

CategoryAdoptionRatio
SaaS / developer tools24.1%97 of 403
E-commerce18.2%10 of 55
News / media15.7%52 of 332
Social platforms15.7%84 of 536
Government / academic11.5%9 of 58
References / wikis0.0%0 of 36
Review sites0.0%0 of 39

Adoption is led by SaaS and developer tools at 24.1% — exactly the community that proposed the standard: llms.txt was proposed by Jeremy Howard of Answer.AI in September 2024. Government and academic sites trail far behind the top categories, and references and review sites are at zero.

This is also where selection bias sits. The sites most likely to adopt llms.txt are already technically sophisticated, well-structured, and API-friendly without it. Those qualities independently correlate with AI visibility. The file rides in the second carriage; it does not pull the train.

The categories with the highest domain authority — references, review sites, academia — have the lowest llms.txt adoption. The domains that dominate AI citations do not need the file: they are cited for brand authority and content quality.

Citation leaders: with the file and without

The ten most-cited domains that do have llms.txt:

DomainCitations
prnewswire.com1,070
github.com449
chainalysis.com291
accio.com236
shopify.com202
essfeed.com200
sodimac.cl160
slashdot.org143
marketsandmarkets.com137
trmlabs.com134

The ten most-cited domains that do not have it:

DomainCitations
reddit.com2,769
techradar.com2,499
reuters.com1,915
linkedin.com1,579
forbes.com1,479
youtube.com1,344
wired.com1,244
axios.com1,015
ft.com945
theverge.com943

The non-adopter column reads like a who's who of the internet's load-bearing sites. Reddit, Reuters, Forbes, LinkedIn dominate AI citations without any llms.txt optimization. The top adopter — prnewswire.com at 1,070 citations — trails Reddit by nearly three to one. Authority wins over the technical signal.

The same picture in the composite index

To rule out an artifact of raw counting, GolOps cross-checked the two groups on a composite AI visibility score that combines presence, rank, mentions, and sentiment across several AI models on a 0–100 scale. The cut is built on 205 brands that have both a website audit and active visibility monitoring.

Visibility score (0–100)With llms.txtWithout llms.txt
Median23.123.6
Average27.826.3

The 0.4-point median gap sits within noise, and it favors the group without the file. Whether you look at raw citations or a composite score, the result is the same: llms.txt is not currently among the factors AI systems use to assemble a recommendation.

What this actually means

Having llms.txt tells AI systems "we want models to understand us," but that is a signal, not a lever: current models do not read or prioritize the file when they assemble citations. The search engines share the skepticism: Google's John Mueller compared llms.txt to the keywords meta tag, a long-ignored signal. Citations are driven by training data: authoritative domains, frequently linked content, structured pages, topical relevance. A text file at /llms.txt does not retroactively rewrite what the model already learned.

None of which makes the file useless. It is cheap to implement and good structural practice, and as models start using it during retrieval-augmented generation, early adopters may benefit. But what drives AI visibility today runs deeper: authoritative content, a strong backlink profile, structured data, consistent publishing, topical expertise. What makes a page worth citing at the page level is something we broke down in The anatomy of an AI citation. The file is insurance for the future, not a way into the choice today.

Methodology

What underpins the numbers:

  • 882 brand snapshots yielded 337,000+ citations across 102,000+ unique domains — aggregated AI-response data.
  • 37,894 domains were selected as having two or more citation appearances; the analysis ran on these.
  • llms.txt detection — asynchronous HTTP checks against /llms.txt with content validation to reject HTML error pages, soft 404s, and login redirects returning a 200 status.
  • Non-parametric test — Mann-Whitney U rather than a t-test, because citation distributions are heavily right-skewed.
  • Confound checkllms.txt adopters do not differ systematically in website-audit scores, which rules out "it's the site quality, not the file."
  • File quality — among adopters, 89% include a title, 98% contain URLs, and 79% score 4/4 on a content-quality rubric. The files are well implemented. They simply do not move citations.

The source is production AI-visibility monitoring data across 882 brands. What is measured is what models actually output, not ideal-condition responses.

The takeaway for practice

The myth offers a simple bargain: one file in your site root and AI systems start citing you more. The data does not back the bargain. Across 37,894 domains, the group with llms.txt and the group without it are cited indistinguishably: p=0.85, an effect size below the threshold for even a "small" effect. There is no advantage in raw citations, and none in the composite visibility index. The belief that "one trick" closes the AI-visibility question rests on convenience, not observation.

AI visibility is not a file but infrastructure: domain authority, content depth, presence in the training corpus, and the loop that measures and corrects all of it. That loop is what GolOps takes under management. We measure a company's position in the field of choice through the Choice Control Index, attribute the result to specific sources and scenarios, and translate the measurement into a prioritized plan. The Strategic Pilot delivers the first cycle in 10–12 weeks; the Command Center keeps the loop running continuously across seven AI systems. That is the infrastructure-level answer — where a single file is powerless.

Another technical signal that does not work as advertised:

Do AI Crawlers Prefer Markdown? A Controlled Experiment

What believing the myth costs

A company that added llms.txt and considered the task done stays exactly as invisible as it was before: the shortlist forms without it, the procurement scenario runs without it. Gartner forecasts 90% of B2B procurement under autonomous AI agents by 2028, and Semrush already shows AI-channel conversion running 4.4× higher than organic search — the real field is moving while the budget goes to a non-lever. The cost of that belief is measured not in the file but in quarters of decisions made without you in the room: each such quarter is a forfeited share of that 90%.

Request an index diagnostic → · Discuss a pilot →