Where AI Finds Its Answers. The Anatomy of 15 Million Citations

When a large language model answers a user, it draws links from a narrow, concentrated field of sources. Access to that field is not unbounded. It formed without the participation of most companies, and it operates by rules that do not reduce to advertising budgets.

GolOps measured this field. 15 million citations from live AI responses, 1,174 observed brands, 265,000 unique domains, 1,050,000 links — over a 90-day window. The data source is the actual responses of AI systems, the ones real users saw.

Metric	Value
Citations analyzed	15,000,000+
Domains tracked	265,000
Brands in sample	1,174
Unique URLs	1,050,000

Data window: 90 days

Key findings

17% — Wikipedia's share. One source holds a sixth of the entire visible field of AI output. At this level of concentration, ordinary visibility tactics lose their meaning. The real competition plays out for the remaining 83% of the citation flow.

60,000+ — the long tail. That is how many domains divide those 83% between them. The real infrastructure of citation begins at entry into the first thousand domains.

1.2× — listicle headlines. Pages built around "Top N" or "N best 2026" formats are cited noticeably more often than standard product pages. Headline format is a measurable lever.

83% — top-tier stability. Most sources at the top of the distribution have held their positions for six months. AI choice is inertial — the window for entering the top tier is shorter than it looks.

A map of top sources

Over a 30-day window — the fifteen domains with the most citations:

#	Domain	Citations	Source type
1	youtube.com	236,322	Video / UGC
2	en.wikipedia.org	88,807	Reference
3	reddit.com	83,578	Social platform
4	forbes.com	28,382	Media
5	pmc.ncbi.nlm.nih.gov	26,905	Academic
6	linkedin.com	25,564	Social platform
7	gartner.com	25,444	Industry analytics
8	edmunds.com	23,997	Vertical aggregator
9	g2.com	22,638	Review platform
10	facebook.com	18,737	Social platform
11	clutch.co	17,087	B2B directory
12	cars.com	16,822	Vertical aggregator
13	carfax.com	14,223	Vertical aggregator
14	nerdwallet.com	13,902	Finance aggregator
15	tripadvisor.com	13,631	Review platform

These are not the "top media" and not a list of thought leaders. The top of the distribution is held by structured knowledge bases, video platforms, review aggregators, and narrow vertical references. AI turns to places where data is marked up and checkable. A loud brand without that structural markup does not get pulled into the sample.

The power law of distribution

The full sample of 15M links forms a classic power-law function.

Position	Domain	Share	Citations
1	en.wikipedia.org	4.26%	639,396
2	youtube.com	2.64%	396,239
3	reddit.com	0.96%	144,320
4	forbes.com	0.44%	66,708
5	linkedin.com	0.37%	55,529
6	techradar.com	0.35%	52,055
7	g2.com	0.33%	49,091
8	gartner.com	0.31%	46,428
9	pmc.ncbi.nlm.nih.gov	0.29%	43,902
10	edmunds.com	0.24%	35,884
11	clutch.co	0.22%	32,739
12	facebook.com	0.20%	29,635
13	nerdwallet.com	0.19%	28,937
14	cars.com	0.17%	24,892
15	tripadvisor.com	0.15%	22,625

Wikipedia holds over 4% of all URL citations; the next source is half that. By position twenty, the share drops below 0.15%; by position one hundred, below 0.06%. After that comes the long tail of tens of thousands of domains, each capturing fractions of a per mille.

A different reading of this curve matters more than "become Wikipedia". The real competition for AI visibility plays out in the 0.01–0.1% band, and that is where you find large vertical resources, niche references, corporate blogs, specialized aggregators. This is the manageable field of choice.

Source types

When 15M citations are broken down by category, the picture stops matching marketing intuition:

Category	Share	Examples
Vertical resources and other	86.5%	gartner.com, edmunds.com, clutch.co, nerdwallet.com
Social platforms / UGC	4.7%	youtube.com, reddit.com, linkedin.com, facebook.com, tiktok.com
References and encyclopedias	4.5%	en.wikipedia.org, investopedia.com, de.wikipedia.org
Media	1.1%	forbes.com, reuters.com, axios.com, businessinsider.com
Review platforms	1.0%	g2.com, tripadvisor.com, m.yelp.com, consumerreports.org
Tech publications	0.6%	techradar.com, wired.com, tomsguide.com, theverge.com
Academic publications	0.5%	pmc.ncbi.nlm.nih.gov, sciencedirect.com, arxiv.org
App stores	0.2%	apps.apple.com, play.google.com
Documentation	0.2%	aws.amazon.com, learn.microsoft.com
Press releases	0.2%	prnewswire.com, businesswire.com
E-commerce	0.2%	amazon.com, shopify.com, walmart.com
Market research	0.2%	marketsandmarkets.com, mordorintelligence.com
Developer	0.2%	github.com, dev.to, stackoverflow.com

The "Vertical resources and other" category — 86.5% — is where most companies actually compete. All industry sites, corporate resources, specialized references, trade platforms, aggregators — everything that does not fit standard labels. This is the real infrastructure of citation, and this is what requires a control loop.

A persistent industry narrative says: "Reddit is the key to AI visibility, UGC wins, forums are the gold source." This narrative does not survive contact with the data.

Social platforms and UGC together account for 4.7% of citations. Reddit alone — 0.96%. References as a class deliver 4.5% — and those 4.5% do more structured work in practice: a single Wikipedia citation in a factual query often forms the entire backbone of the answer.

Large language models reach for Reddit and YouTube when the query is subjective: "best running headphones" or "honest user reviews". In factual and B2B queries, social platforms disappear.

The reason is simple. AI prefers sources with predictable structure — an extractable answer and a single markup format. Wikipedia wins on exactly that: its content is laid out to be pulled directly. Social platforms are dialogic and contradictory, without a stable citation frame. For a short answer in a chat window, they are too noisy.

This does not amount to "forget Reddit". It points somewhere else: AI visibility cannot rest on a UGC layer alone. Social platforms cover up to 5% of the field. The other 95% plays by different rules.

Query intent

GolOps broke down 23,093 unique prompts by intent type. The distribution shows how users actually turn to AI:

Query type	Share	What the user is looking for
"Best / top N"	35.1%	Ready-made shortlists, rankings, recommendations
Comparison "X vs Y"	9.7%	Paired evaluations, choice between alternatives
"How to"	3.1%	Step-by-step instructions
"Find / discover"	2.0%	New, previously unknown options
"Alternatives to X"	0.2%	Replacement for a known solution
Factual	49.9%	Definitions, numbers, references

One in three queries is a search for a list. That means an AI answer about your category will, with high probability, arrive as a ranked shortlist. The deciding condition for inclusion is whether you are mentioned in the sources the model relies on when assembling that list.

What drives citation rate

GolOps measured how page format correlates with citation frequency. Four measurable levers:

Listicle headlines → 1.2×. Pages built as "Top 10" or "5 Best 2026" are cited roughly 20% more often than standard product pages on the same topic. The driver lies in the content format such a headline tends to sit above — it is usually structured for easy extraction.

Comparison headlines → 1.1×. Constructions like "X vs Y", "X or Y", "Comparing X and Y" form a separate category with its own lift. The link is direct: 9.7% of AI queries are comparative, and the model specifically searches for such sources.

Instructional headlines → 1.1×. "How to", "Guide to", "Step by step" formats receive comparable lift, matching the "How to" intent (3.1% of queries).

Year in the headline → 1.1×. "Best platforms in 2026", "2026 guide" — simply including the current year raises citation rates by around 10%. A freshness signal.

A parallel measurement — recency in the technology vertical. 25.3% of citations in tech-category responses point to content less than 60 days old. In other categories, this figure is much lower. For tech audiences, old pages are dead assets that require regular refresh.

Brand mentions → 1.5×. A page on which the brand name appears is cited 1.5 times more often than a page without it. This works in both directions: the mention must exist not only on your own domain, but in third-party authoritative contexts.

Methodology

What underpins the numbers:

15 million citations — all drawn from live responses of AI systems with web search, deployed in production. The source is actual model output; simulations and offline snapshots are not included in the sample.
1,174 observed brands — a sample across industries and sizes, from global corporations to mid-market B2B players.
265,000 domains — the full set of sources models referenced within the window.
Rolling 90-day window — data refreshes daily; trends are computed on fresh material.
Measurement sources — modern large language models with web-search capabilities, available in commercial interfaces.

The guiding principle: what is measurable is what the user actually saw in the answer window. Theoretical model potential and ideal-condition outputs sit outside the sample.

Translating the problem into decision-maker language

If 17% of all citations go to one source and the remaining 83% spread across tens of thousands of players, what companies face is an infrastructure shift that rewrites the rules of market entry. Marketing instruments do not resolve that shift.

AI visibility is determined by how your presence is architected in the sources a model relies on; advertising activity contributes weakly. A place on the AI shortlist follows from how well the brand is embedded in the citation structure — name recognition alone is insufficient. And the 83% stability of top sources opens a window for those who start building the loop now: two or three years from now, the field will harden, and the cost of entry will rise.

This is an infrastructure layer. It measures a company's presence in the field of choice and the manner of its representation in AI output.

GolOps turns these variables into a manageable layer. Measurement, interpretation, action and re-measurement run on a single database and through a single interface.

Request an AI visibility diagnostic →