Where AI Finds Its Answers. The Anatomy of 15 Million Citations
GolOps research — 15M citations from AI output, 1,174 brands, 265K domains. The map of sources large language models rely on, and the field of choice where AI visibility is decided.

Where AI Finds Its Answers. The Anatomy of 15 Million Citations
When a large language model answers a user, it draws links from a narrow, concentrated field of sources. Access to that field is not unbounded. It formed without the participation of most companies, and it operates by rules that do not reduce to advertising budgets.
GolOps measured this field. 15 million citations from live AI responses, 1,174 observed brands, 265,000 unique domains, 1,050,000 links — over a 90-day window. The data source is the actual responses of AI systems, the ones real users saw.
| Metric | Value |
|---|---|
| Citations analyzed | 15,000,000+ |
| Domains tracked | 265,000 |
| Brands in sample | 1,174 |
| Unique URLs | 1,050,000 |
Data window: 90 days
Key findings
17% — Wikipedia's share. One source holds a sixth of the entire visible field of AI output. At this level of concentration, ordinary visibility tactics lose their meaning. The real competition plays out for the remaining 83% of the citation flow.
60,000+ — the long tail. That is how many domains divide those 83% between them. The real infrastructure of citation begins at entry into the first thousand domains.
1.2× — listicle headlines. Pages built around "Top N" or "N best 2026" formats are cited noticeably more often than standard product pages. Headline format is a measurable lever.
83% — top-tier stability. Most sources at the top of the distribution have held their positions for six months. AI choice is inertial — the window for entering the top tier is shorter than it looks.
A map of top sources
Over a 30-day window — the fifteen domains with the most citations:
| # | Domain | Citations | Source type |
|---|---|---|---|
| 1 | youtube.com | 236,322 | Video / UGC |
| 2 | en.wikipedia.org | 88,807 | Reference |
| 3 | reddit.com | 83,578 | Social platform |
| 4 | forbes.com | 28,382 | Media |
| 5 | pmc.ncbi.nlm.nih.gov | 26,905 | Academic |
| 6 | linkedin.com | 25,564 | Social platform |
| 7 | gartner.com | 25,444 | Industry analytics |
| 8 | edmunds.com | 23,997 | Vertical aggregator |
| 9 | g2.com | 22,638 | Review platform |
| 10 | facebook.com | 18,737 | Social platform |
| 11 | clutch.co | 17,087 | B2B directory |
| 12 | cars.com | 16,822 | Vertical aggregator |
| 13 | carfax.com | 14,223 | Vertical aggregator |
| 14 | nerdwallet.com | 13,902 | Finance aggregator |
| 15 | tripadvisor.com | 13,631 | Review platform |
These are not the "top media" and not a list of thought leaders. The top of the distribution is held by structured knowledge bases, video platforms, review aggregators, and narrow vertical references. AI turns to places where data is marked up and checkable. A loud brand without that structural markup does not get pulled into the sample.
The power law of distribution
The full sample of 15M links forms a classic power-law function.
| Position | Domain | Share | Citations |
|---|---|---|---|
| 1 | en.wikipedia.org | 4.26% | 639,396 |
| 2 | youtube.com | 2.64% | 396,239 |
| 3 | reddit.com | 0.96% | 144,320 |
| 4 | forbes.com | 0.44% | 66,708 |
| 5 | linkedin.com | 0.37% | 55,529 |
| 6 | techradar.com | 0.35% | 52,055 |
| 7 | g2.com | 0.33% | 49,091 |
| 8 | gartner.com | 0.31% | 46,428 |
| 9 | pmc.ncbi.nlm.nih.gov | 0.29% | 43,902 |
| 10 | edmunds.com | 0.24% | 35,884 |
| 11 | clutch.co | 0.22% | 32,739 |
| 12 | facebook.com | 0.20% | 29,635 |
| 13 | nerdwallet.com | 0.19% | 28,937 |
| 14 | cars.com | 0.17% | 24,892 |
| 15 | tripadvisor.com | 0.15% | 22,625 |
Wikipedia holds over 4% of all URL citations; the next source is half that. By position twenty, the share drops below 0.15%; by position one hundred, below 0.06%. After that comes the long tail of tens of thousands of domains, each capturing fractions of a per mille.
A different reading of this curve matters more than "become Wikipedia". The real competition for AI visibility plays out in the 0.01–0.1% band, and that is where you find large vertical resources, niche references, corporate blogs, specialized aggregators. This is the manageable field of choice.
Source types
When 15M citations are broken down by category, the picture stops matching marketing intuition:
| Category | Share | Examples |
|---|---|---|
| Vertical resources and other | 86.5% | gartner.com, edmunds.com, clutch.co, nerdwallet.com |
| Social platforms / UGC | 4.7% | youtube.com, reddit.com, linkedin.com, facebook.com, tiktok.com |
| References and encyclopedias | 4.5% | en.wikipedia.org, investopedia.com, de.wikipedia.org |
| Media | 1.1% | forbes.com, reuters.com, axios.com, businessinsider.com |
| Review platforms | 1.0% | g2.com, tripadvisor.com, m.yelp.com, consumerreports.org |
| Tech publications | 0.6% | techradar.com, wired.com, tomsguide.com, theverge.com |
| Academic publications | 0.5% | pmc.ncbi.nlm.nih.gov, sciencedirect.com, arxiv.org |
| App stores | 0.2% | apps.apple.com, play.google.com |
| Documentation | 0.2% | aws.amazon.com, learn.microsoft.com |
| Press releases | 0.2% | prnewswire.com, businesswire.com |
| E-commerce | 0.2% | amazon.com, shopify.com, walmart.com |
| Market research | 0.2% | marketsandmarkets.com, mordorintelligence.com |
| Developer | 0.2% | github.com, dev.to, stackoverflow.com |
The "Vertical resources and other" category — 86.5% — is where most companies actually compete. All industry sites, corporate resources, specialized references, trade platforms, aggregators — everything that does not fit standard labels. This is the real infrastructure of citation, and this is what requires a control loop.
The phantom of social platforms
A persistent industry narrative says: "Reddit is the key to AI visibility, UGC wins, forums are the gold source." This narrative does not survive contact with the data.
Social platforms and UGC together account for 4.7% of citations. Reddit alone — 0.96%. References as a class deliver 4.5% — and those 4.5% do more structured work in practice: a single Wikipedia citation in a factual query often forms the entire backbone of the answer.
Large language models reach for Reddit and YouTube when the query is subjective: "best running headphones" or "honest user reviews". In factual and B2B queries, social platforms disappear.
The reason is simple. AI prefers sources with predictable structure — an extractable answer and a single markup format. Wikipedia wins on exactly that: its content is laid out to be pulled directly. Social platforms are dialogic and contradictory, without a stable citation frame. For a short answer in a chat window, they are too noisy.
This does not amount to "forget Reddit". It points somewhere else: AI visibility cannot rest on a UGC layer alone. Social platforms cover up to 5% of the field. The other 95% plays by different rules.
Query intent
GolOps broke down 23,093 unique prompts by intent type. The distribution shows how users actually turn to AI:
| Query type | Share | What the user is looking for |
|---|---|---|
| "Best / top N" | 35.1% | Ready-made shortlists, rankings, recommendations |
| Comparison "X vs Y" | 9.7% | Paired evaluations, choice between alternatives |
| "How to" | 3.1% | Step-by-step instructions |
| "Find / discover" | 2.0% | New, previously unknown options |
| "Alternatives to X" | 0.2% | Replacement for a known solution |
| Factual | 49.9% | Definitions, numbers, references |
One in three queries is a search for a list. That means an AI answer about your category will, with high probability, arrive as a ranked shortlist. The deciding condition for inclusion is whether you are mentioned in the sources the model relies on when assembling that list.
What drives citation rate
GolOps measured how page format correlates with citation frequency. Four measurable levers:
Listicle headlines → 1.2×. Pages built as "Top 10" or "5 Best 2026" are cited roughly 20% more often than standard product pages on the same topic. The driver lies in the content format such a headline tends to sit above — it is usually structured for easy extraction.
Comparison headlines → 1.1×. Constructions like "X vs Y", "X or Y", "Comparing X and Y" form a separate category with its own lift. The link is direct: 9.7% of AI queries are comparative, and the model specifically searches for such sources.
Instructional headlines → 1.1×. "How to", "Guide to", "Step by step" formats receive comparable lift, matching the "How to" intent (3.1% of queries).
Year in the headline → 1.1×. "Best platforms in 2026", "2026 guide" — simply including the current year raises citation rates by around 10%. A freshness signal.
A parallel measurement — recency in the technology vertical. 25.3% of citations in tech-category responses point to content less than 60 days old. In other categories, this figure is much lower. For tech audiences, old pages are dead assets that require regular refresh.
Brand mentions → 1.5×. A page on which the brand name appears is cited 1.5 times more often than a page without it. This works in both directions: the mention must exist not only on your own domain, but in third-party authoritative contexts.
Methodology
What underpins the numbers:
- 15 million citations — all drawn from live responses of AI systems with web search, deployed in production. The source is actual model output; simulations and offline snapshots are not included in the sample.
- 1,174 observed brands — a sample across industries and sizes, from global corporations to mid-market B2B players.
- 265,000 domains — the full set of sources models referenced within the window.
- Rolling 90-day window — data refreshes daily; trends are computed on fresh material.
- Measurement sources — modern large language models with web-search capabilities, available in commercial interfaces.
The guiding principle: what is measurable is what the user actually saw in the answer window. Theoretical model potential and ideal-condition outputs sit outside the sample.
Translating the problem into decision-maker language
If 17% of all citations go to one source and the remaining 83% spread across tens of thousands of players, what companies face is an infrastructure shift that rewrites the rules of market entry. Marketing instruments do not resolve that shift.
AI visibility is determined by how your presence is architected in the sources a model relies on; advertising activity contributes weakly. A place on the AI shortlist follows from how well the brand is embedded in the citation structure — name recognition alone is insufficient. And the 83% stability of top sources opens a window for those who start building the loop now: two or three years from now, the field will harden, and the cost of entry will rise.
This is an infrastructure layer. It measures a company's presence in the field of choice and the manner of its representation in AI output.
GolOps turns these variables into a manageable layer. Measurement, interpretation, action and re-measurement run on a single database and through a single interface.