GolOps
Back to Blog
Research

Where AI Finds Its Answers. The Anatomy of 15 Million Citations

GolOps research — 15M citations from AI output, 1,174 brands, 265K domains. The map of sources large language models rely on, and the field of choice where AI visibility is decided.

GolOps TeamGolOps Team
Where AI Finds Its Answers. The Anatomy of 15 Million Citations
Where AI Finds Its Answers. The Anatomy of 15 Million Citations
GolOps Lab

Where AI Finds Its Answers. The Anatomy of 15 Million Citations

When a large language model answers a user, it draws links from a narrow, concentrated field of sources. Access to that field is not unbounded. It formed without the participation of most companies, and it operates by rules that do not reduce to advertising budgets.

GolOps measured this field. 15 million citations from live AI responses, 1,174 observed brands, 265,000 unique domains, 1,050,000 links — over a 90-day window. The data source is the actual responses of AI systems, the ones real users saw.

MetricValue
Citations analyzed15,000,000+
Domains tracked265,000
Brands in sample1,174
Unique URLs1,050,000

Data window: 90 days

Key findings

17% — Wikipedia's share. One source holds a sixth of the entire visible field of AI output. At this level of concentration, ordinary visibility tactics lose their meaning. The real competition plays out for the remaining 83% of the citation flow.

60,000+ — the long tail. That is how many domains divide those 83% between them. The real infrastructure of citation begins at entry into the first thousand domains.

1.2× — listicle headlines. Pages built around "Top N" or "N best 2026" formats are cited noticeably more often than standard product pages. Headline format is a measurable lever.

83% — top-tier stability. Most sources at the top of the distribution have held their positions for six months. AI choice is inertial — the window for entering the top tier is shorter than it looks.

A map of top sources

Over a 30-day window — the fifteen domains with the most citations:

#DomainCitationsSource type
1youtube.com236,322Video / UGC
2en.wikipedia.org88,807Reference
3reddit.com83,578Social platform
4forbes.com28,382Media
5pmc.ncbi.nlm.nih.gov26,905Academic
6linkedin.com25,564Social platform
7gartner.com25,444Industry analytics
8edmunds.com23,997Vertical aggregator
9g2.com22,638Review platform
10facebook.com18,737Social platform
11clutch.co17,087B2B directory
12cars.com16,822Vertical aggregator
13carfax.com14,223Vertical aggregator
14nerdwallet.com13,902Finance aggregator
15tripadvisor.com13,631Review platform

These are not the "top media" and not a list of thought leaders. The top of the distribution is held by structured knowledge bases, video platforms, review aggregators, and narrow vertical references. AI turns to places where data is marked up and checkable. A loud brand without that structural markup does not get pulled into the sample.

The power law of distribution

The full sample of 15M links forms a classic power-law function.

PositionDomainShareCitations
1en.wikipedia.org4.26%639,396
2youtube.com2.64%396,239
3reddit.com0.96%144,320
4forbes.com0.44%66,708
5linkedin.com0.37%55,529
6techradar.com0.35%52,055
7g2.com0.33%49,091
8gartner.com0.31%46,428
9pmc.ncbi.nlm.nih.gov0.29%43,902
10edmunds.com0.24%35,884
11clutch.co0.22%32,739
12facebook.com0.20%29,635
13nerdwallet.com0.19%28,937
14cars.com0.17%24,892
15tripadvisor.com0.15%22,625

Wikipedia holds over 4% of all URL citations; the next source is half that. By position twenty, the share drops below 0.15%; by position one hundred, below 0.06%. After that comes the long tail of tens of thousands of domains, each capturing fractions of a per mille.

A different reading of this curve matters more than "become Wikipedia". The real competition for AI visibility plays out in the 0.01–0.1% band, and that is where you find large vertical resources, niche references, corporate blogs, specialized aggregators. This is the manageable field of choice.

Source types

When 15M citations are broken down by category, the picture stops matching marketing intuition:

CategoryShareExamples
Vertical resources and other86.5%gartner.com, edmunds.com, clutch.co, nerdwallet.com
Social platforms / UGC4.7%youtube.com, reddit.com, linkedin.com, facebook.com, tiktok.com
References and encyclopedias4.5%en.wikipedia.org, investopedia.com, de.wikipedia.org
Media1.1%forbes.com, reuters.com, axios.com, businessinsider.com
Review platforms1.0%g2.com, tripadvisor.com, m.yelp.com, consumerreports.org
Tech publications0.6%techradar.com, wired.com, tomsguide.com, theverge.com
Academic publications0.5%pmc.ncbi.nlm.nih.gov, sciencedirect.com, arxiv.org
App stores0.2%apps.apple.com, play.google.com
Documentation0.2%aws.amazon.com, learn.microsoft.com
Press releases0.2%prnewswire.com, businesswire.com
E-commerce0.2%amazon.com, shopify.com, walmart.com
Market research0.2%marketsandmarkets.com, mordorintelligence.com
Developer0.2%github.com, dev.to, stackoverflow.com

The "Vertical resources and other" category — 86.5% — is where most companies actually compete. All industry sites, corporate resources, specialized references, trade platforms, aggregators — everything that does not fit standard labels. This is the real infrastructure of citation, and this is what requires a control loop.

The phantom of social platforms

A persistent industry narrative says: "Reddit is the key to AI visibility, UGC wins, forums are the gold source." This narrative does not survive contact with the data.

Social platforms and UGC together account for 4.7% of citations. Reddit alone — 0.96%. References as a class deliver 4.5% — and those 4.5% do more structured work in practice: a single Wikipedia citation in a factual query often forms the entire backbone of the answer.

Large language models reach for Reddit and YouTube when the query is subjective: "best running headphones" or "honest user reviews". In factual and B2B queries, social platforms disappear.

The reason is simple. AI prefers sources with predictable structure — an extractable answer and a single markup format. Wikipedia wins on exactly that: its content is laid out to be pulled directly. Social platforms are dialogic and contradictory, without a stable citation frame. For a short answer in a chat window, they are too noisy.

This does not amount to "forget Reddit". It points somewhere else: AI visibility cannot rest on a UGC layer alone. Social platforms cover up to 5% of the field. The other 95% plays by different rules.

Query intent

GolOps broke down 23,093 unique prompts by intent type. The distribution shows how users actually turn to AI:

Query typeShareWhat the user is looking for
"Best / top N"35.1%Ready-made shortlists, rankings, recommendations
Comparison "X vs Y"9.7%Paired evaluations, choice between alternatives
"How to"3.1%Step-by-step instructions
"Find / discover"2.0%New, previously unknown options
"Alternatives to X"0.2%Replacement for a known solution
Factual49.9%Definitions, numbers, references

One in three queries is a search for a list. That means an AI answer about your category will, with high probability, arrive as a ranked shortlist. The deciding condition for inclusion is whether you are mentioned in the sources the model relies on when assembling that list.

What drives citation rate

GolOps measured how page format correlates with citation frequency. Four measurable levers:

Listicle headlines → 1.2×. Pages built as "Top 10" or "5 Best 2026" are cited roughly 20% more often than standard product pages on the same topic. The driver lies in the content format such a headline tends to sit above — it is usually structured for easy extraction.

Comparison headlines → 1.1×. Constructions like "X vs Y", "X or Y", "Comparing X and Y" form a separate category with its own lift. The link is direct: 9.7% of AI queries are comparative, and the model specifically searches for such sources.

Instructional headlines → 1.1×. "How to", "Guide to", "Step by step" formats receive comparable lift, matching the "How to" intent (3.1% of queries).

Year in the headline → 1.1×. "Best platforms in 2026", "2026 guide" — simply including the current year raises citation rates by around 10%. A freshness signal.

A parallel measurement — recency in the technology vertical. 25.3% of citations in tech-category responses point to content less than 60 days old. In other categories, this figure is much lower. For tech audiences, old pages are dead assets that require regular refresh.

Brand mentions → 1.5×. A page on which the brand name appears is cited 1.5 times more often than a page without it. This works in both directions: the mention must exist not only on your own domain, but in third-party authoritative contexts.

Methodology

What underpins the numbers:

  • 15 million citations — all drawn from live responses of AI systems with web search, deployed in production. The source is actual model output; simulations and offline snapshots are not included in the sample.
  • 1,174 observed brands — a sample across industries and sizes, from global corporations to mid-market B2B players.
  • 265,000 domains — the full set of sources models referenced within the window.
  • Rolling 90-day window — data refreshes daily; trends are computed on fresh material.
  • Measurement sources — modern large language models with web-search capabilities, available in commercial interfaces.

The guiding principle: what is measurable is what the user actually saw in the answer window. Theoretical model potential and ideal-condition outputs sit outside the sample.

Translating the problem into decision-maker language

If 17% of all citations go to one source and the remaining 83% spread across tens of thousands of players, what companies face is an infrastructure shift that rewrites the rules of market entry. Marketing instruments do not resolve that shift.

AI visibility is determined by how your presence is architected in the sources a model relies on; advertising activity contributes weakly. A place on the AI shortlist follows from how well the brand is embedded in the citation structure — name recognition alone is insufficient. And the 83% stability of top sources opens a window for those who start building the loop now: two or three years from now, the field will harden, and the cost of entry will rise.

This is an infrastructure layer. It measures a company's presence in the field of choice and the manner of its representation in AI output.

GolOps turns these variables into a manageable layer. Measurement, interpretation, action and re-measurement run on a single database and through a single interface.

Request an AI visibility diagnostic →