When AI comes to your website. The anatomy of 600K crawler visits

Before an AI system cites a company, its crawler has to arrive at the site, read a page, and decide what to do with it. Most companies do not know who comes to them, how often, or what exactly gets taken. And the crawl looks nothing like Google's.

GolOps broke down that behavior. 575,788 AI crawler visits, 7 systems — GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Bytespider — across a window from June 2025 to February 2026. The data source is server-side access logs: actual bot requests, not estimates or simulations.

Metric	Value
Crawler visits analyzed	575,788+
AI crawlers tracked	7
OpenAI share of traffic	72.3%
Pages visited exactly once	88.5%

Data window: June 2025 — February 2026

Key findings

72.3% of traffic — OpenAI. GPTBot and OAI-SearchBot together account for nearly three quarters of all AI crawler visits. That is four times more than Anthropic, Google, Perplexity, Meta, and Amazon combined. Optimize for the wrong crawl and you optimize for the wrong system.

AI crawlers skip your homepage. ChatGPT's training crawler visits the homepage only 2.8% of the time — it goes straight into depth: articles, documentation, product pages. ClaudeBot behaves differently, starting at the homepage 19.2% of the time, a top-down crawl. Different systems read a site by different logic.

88.5% of pages get exactly one visit. Most crawlers operate on a one-and-done basis. Your content has to be ready before the bot arrives — there may be no second chance. So fixing a page after publication rarely helps; what matters is first-crawl readiness.

The blog is the new front door. ChatGPT's search crawler starts a session on a blog page 21% of the time, against 1% for the homepage. AI search pulls an answer to a specific question, not your site hierarchy. What gets cited is the guide, not the landing page.

The three-click rule. More than half of all training traffic lands on pages within three clicks of the homepage. Content buried at depth 5+ rarely gets found. A flat architecture is a measurable advantage.

Methodology

The basis is server-side access logs from sites connected to GolOps crawler monitoring. The window runs June 2025 — February 2026, covering more than 575,788 individual visits from GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, and other AI-identified bots. Data was anonymized and aggregated before analysis. Visit counts, page depth, entry points, and revisit frequency were computed per crawler — to separate the behavior of training crawlers from real-time search crawlers.

Two kinds of crawler: training versus search

A crawler's goal determines when and how your content reaches a user's answer.

Training crawlers collect data for future models. Your content shapes how AI answers months from now. The effect is delayed but long-lasting.

Search crawlers pull content in real time. When a user asks a question, the system fetches the page and cites it directly in the answer. The effect is immediate — a page can appear in a ChatGPT answer today.

Type	Traffic share	Effect	What it sets
Training crawlers	61%	Long-term	How AI describes you months from now
Search crawlers	15%	Immediate	Whether you appear in an answer today

This is where the gap between crawling and citing runs. The training crawl is an investment in a future position. The search crawl is participation in choice right now. And the gap is enormous: according to Cloudflare, Anthropic makes roughly 70,900 HTML page requests per single referral. Crawling happens far more often than actual citation.

Who crawls your site

Between training and search, OpenAI controls 72.3% of all AI crawler traffic. Claude — 3.8%. TikTok's ByteSpider quietly holds third place and crawls more sites than any other bot. Meta and Amazon round out the top five, but neither cracks 8%.

#	Crawler	Visits	Share
1	ChatGPT Training (GPTBot)	329,572	57.2%
2	ChatGPT Search (OAI-SearchBot)	87,155	15.1%
3	ByteSpider (TikTok)	52,704	9.2%
4	Meta	45,445	7.9%
5	Amazon	38,335	6.7%
6	Claude Training (ClaudeBot)	22,074	3.8%

Ranked by traffic share. June 2025 — February 2026

OpenAI's ratio of training to search crawling is 3.8 : 1. The bot market is split unevenly: one dominant system and a long tail of everyone else. Anyone who wants to manage their position in the field of choice starts with what the OpenAI crawler actually sees. Cloudflare publishes a breakdown of AI crawler traffic by purpose and industry, and it confirms that training and search behave as two distinct kinds of crawl.

Different crawl philosophies

Training crawlers read a site differently, and the differences are not random.

Crawler	Homepage entries	Logic
Claude Training	19.2%	Figures out who you are first
ChatGPT Training	2.8%	Dives straight into content depth

Claude visits the homepage 7× more often than ChatGPT's training crawler. It wants to understand who you are and where your authority lies — hence the top-down crawl. ChatGPT skips the facade and goes for content. For Claude, that means the homepage has to answer "who is this company and what is it an authority on." For ChatGPT, catalog depth matters more.

Timing reveals intent

Crawlers run on a schedule, and each system keeps its own.

Crawler	Weekday avg	Weekend avg	Change
ChatGPT Training	1,430	1,841	+29%
ChatGPT Search	383	540	+41%
Claude Training	99	91	−8%

OpenAI ramps up crawling on weekends, when human traffic drops, taking the spare capacity. Claude does the opposite: 8% less active on weekends. The practical takeaway: a weekday publish is picked up faster by Anthropic, a weekend publish faster by OpenAI.

The blog is your front door into AI

ChatGPT's search crawler — the one that feeds real-time answers — starts a session on a blog page 21% of the time. This is not random crawling. When a user asks a question, the system deliberately fetches blog content.

Entry point	Share of sessions	What it is
Blog pages	21%	Articles, guides, breakdowns
Product pages	3%	Features, pricing, docs
Homepage	1%	Main landing page
Other	75%	All other entries

Entry through blog content is 21× more likely than through the homepage. So the crawler is not indexing the site hierarchically — it is looking for an answer to a specific query. The first pages fetched are those that directly answer a question: "how to," "best practices," "X vs Y" comparisons. This is the new organic channel, and it rests on guides, comparisons, and instructions. We broke this gap down separately: AI crawls your product pages, but it cites your blog.

Why they visit only once

Visits per URL	Share of URLs
1	88.5%
2	8.3%
3–5	2.4%
6–10	0.4%
10+	0.3%

88.5% of addresses are visited exactly once and never again. Only 2.4% of URLs earn a third look. Even outliers cap out at around five visits (P99). The crawler treats a page as disposable: one look, no return.

A hard rule follows. Content has to be ready at the moment of the first crawl: markup, structure, brand mention, freshness. And ready in HTML: the Vercel study found that AI crawlers do not execute JavaScript — what isn't server-rendered, the bot never sees. Tweaking after publication works weakly — by then the bot has already left and will not come back. Whether markup or the format itself matters more for that first crawl, we tested in a controlled experiment.

The three-click rule

ChatGPT's training crawler follows site architecture. Mid-depth pages get the most attention; the homepage accounts for less than 3% of visits.

Depth	Example	Share of visits
Depth 0	`/`	2.7%
Depth 1	`/about`	10.3%
Depth 2	`/blog/post`	19.6%
Depth 3	`/blog/2024/post`	51.7%
Depth 3	`/docs/api/auth`	12.0%
Depth 4+	`/docs/api/v1/...`	3.7%

More than half of the crawl concentrates at depth 3. If your best content is buried at depth 5+, the chance the crawler finds it drops sharply. Keep important pages within three clicks of the homepage. A flat architecture is not a matter of taste — it is a measurable advantage in the crawl.

Reach versus depth

Each crawler makes its own trade-off between breadth of coverage and depth of crawl.

Crawler	Site coverage	Visits per site	Strategy
ChatGPT Search	76%	1,362	Wide reach, moderate depth
ChatGPT Training	70%	5,586	Fewer sites, exhaustive crawl
Claude Training	56%	470	Selective, targeted

ChatGPT Search bets on breadth — crawling 76% of sites in the sample. ChatGPT Training bets on depth: fewer sites, but an average of 5,586 visits each. Claude is the most selective, at just 470 visits per site. An important detail for smaller companies: more sites are reachable through ChatGPT's search crawler (76%) than are deeply trained on by the training crawler (70%). The chance of appearing in a real-time answer is higher than the chance of entering the training sample.

The management read

Crawling is not citing. Between the bot's arrival and your brand appearing in an answer lies a gap: the page has to be read, a usable fragment extracted, and that fragment selected at assembly time — a question of structure, not content volume.
One pass and it's over. The crawler visits 88.5% of pages exactly once; the window of influence is narrow and does not repeat, so content has to be ready before the bot arrives, not refined afterward.
The blog is the front door. ChatGPT Search enters through the blog 21× more often than through the homepage — what gets cited is the answer to a specific question, not the site hierarchy.

GolOps takes this layer under management: it measures position in the field of choice through the Choice Control Index, ties it to specific crawlers and scenarios, and translates the measurement into a prioritized plan. The Strategic Pilot closes the first cycle in 10–12 weeks, and the Command Center keeps the loop running across seven AI systems.

What the silence costs

The crawl is already running while your position in choice stays unknown: the bot arrives, but the company has no idea which version of itself it read. Gartner forecasts 90% of B2B procurement under autonomous AI agents by 2028, and Semrush already shows AI-channel conversion running 4.4× higher than organic search. And the cost of silence comes down to one number: OpenAI controls 72.3% of all crawling, and every unhandled visit from that crawler goes to the competitor whose page it read and took.

But getting cited is only half the battle — you still have to hold the citation:

The half-life of AI citations. How fast you stop being cited

Request an index diagnostic → · Discuss a pilot →