﻿---
title: Do AI Crawlers Prefer Markdown? A Controlled Experiment
description: GolOps research — 9,033 pages randomized between Markdown and HTML, 28,000 live AI fetches, four crawlers. One crawler leans HTML by −29.4%, three stay flat, Markdown payload is 75.9% smaller.
date: 2026-01-14T00:00:00Z
lastmod: 2026-06-02T00:00:00Z
published: true
categories: [research, llm]
author: golops
---

Every few weeks someone states with full confidence that AI crawlers love Markdown. The next person, just as confident, says the format makes no difference. Almost nobody runs the test. GolOps ran it: same URL, two formats, four crawlers. The crawlers do not agree with each other.

GolOps randomized 9,033 public pages between two surfaces — Markdown and HTML — and tracked what four different AI crawlers do with each version. On top of that sit 28,000 real-time fetches from ChatGPT-User. The headline result: of the four crawlers, one picked a side. And that one is GPTBot, OpenAI's training scraper, not the crawler that decides who gets cited in a live answer.

| Metric | Value |
|---|---|
| Pages randomized | 9,033 |
| Markdown / HTML split | 4,516 / 4,517 |
| Live AI fetches measured | 28,000 |
| Markdown payload saving | 75.9% |

*Surface is locked to the URL: the same address always serves the same format.*

## Key findings

**−29.4% — GPTBot leans HTML.** The only statistically settled result on the board. GPTBot barely takes Markdown pages: 2.5% versus 31.9% for HTML (p&lt;0.001). But GPTBot is a training scraper. It tells you how the corpus for future models gets selected, not who gets cited today.

**+2.8% — OAI-SearchBot leans Markdown.** The crawler behind ChatGPT Search reaches Markdown pages slightly more often (47.2% versus 44.4%), but the gap is still inside the noise band (p=0.189). This is the one line worth keeping a hand on.

**−0.3% — ChatGPT-User sees no difference.** The real-time fetch takes Markdown and HTML at almost identical rates (76.7% versus 77.0%, p=0.859). At conversation time, format does not decide which page gets opened. The user's question does.

**75.9% — Markdown payload.** The same page in Markdown weighs roughly a quarter of the HTML version. Cheaper to fetch, faster to parse. If retrieval cost becomes a routing signal inside the labs, that gap starts to matter.

## The experiment design

Every public page gets one of two surfaces — Markdown or HTML — based on a hash of its URL. Same content, same canonical, different format. The crawlers do not know they are in a test; they just see a page. Because the surface is locked to the URL, the same address always serves the same format, so when a crawler comes back we know which arm it is on. That keeps the comparison clean.

| Parameter | Value |
|---|---|
| Pages in the test | 9,033 |
| Split | 50/50, random but stable (4,516 Markdown / 4,517 HTML) |
| Where the effect is measured | 2,249 recommendation pages — the most heavily crawled URLs |
| Assignment method | hash(URL + experiment ID) → Markdown or HTML |

Every public, indexable page with a stable URL enters the test. Login-walled routes, redirects, and preview pages are left out. The core measurement layer is recommendation pages at `/ai-recommends/<product>/<audience>` ("best AI transcription for nonprofits" and the like). There are thousands of them, which gives this slice enough volume to detect small effects.

## OpenAI runs three crawlers, each doing a different job

Most write-ups blur OpenAI's crawlers into one. That is a measurement error. Each does a different job, runs on a different schedule, and responds to format differently. Per [Vercel](https://vercel.com/blog/the-rise-of-the-ai-crawler), AI crawlers fetch raw content and do not render JavaScript — so how you serve a page matters more than it looks. Count GPTBot fetches as evidence of live citations, or read ChatGPT-User numbers as search indexing, and you are measuring the wrong thing.

**OAI-SearchBot — the search index crawler.** Crawls steadily, like a search engine. Pulls pages into OpenAI's search index, the system that decides what surfaces inside ChatGPT Search. If you want to show up when ChatGPT searches the open web, this is the crawler whose preferences matter most.

**ChatGPT-User — live retrieval.** Opens a page in real time when someone in ChatGPT asks a question and the model decides it needs more context to answer. Pure conversation-time demand. Whatever the user asks, this is the bot that goes and fetches.

**GPTBot — the training-data scraper.** Comes in heavy bursts on a schedule. Pulls pages into the corpus used to train future versions of GPT. It tells you about training-pipeline preferences, not whether your page gets cited when a real user is talking to ChatGPT today. How these bots actually move across a site, and on what schedules, we broke down separately in [When AI comes to your website. The anatomy of 600K crawler visits](/en/publications/ai-crawler-discovery).

## Results: one settled signal, three still moving

Across recommendation pages plus ChatGPT-User live retrieval: of five crawlers, one shows a clear Markdown-versus-HTML preference. One leans but lacks the data to call it. Three sit flat, with the same reach for both formats.

| Crawler | Type | Shift | Markdown | HTML | Significance |
|---|---|---|---|---|---|
| GPTBot | training | −29.4% | 2.5% (28 of 1,119) | 31.9% (361 of 1,130) | p&lt;0.001 |
| OAI-SearchBot | search | +2.8% | 47.2% (528 of 1,119) | 44.4% (502 of 1,130) | p=0.189 |
| ChatGPT-User | interaction | −0.3% | 76.7% | 77.0% | p=0.859 (flat) |
| PerplexityBot | search | −1.3% | 8.4% (94 of 1,119) | 9.7% (110 of 1,130) | p=0.271 |
| ClaudeBot | training | −2.0% | 8.9% (100 of 1,119) | 11.0% (124 of 1,130) | p=0.107 |

GPTBot's big HTML lean is the only statistically settled result on the board, and we are careful with it. GPTBot feeds future model versions; it does not decide what gets cited today. Interesting, but too early to change a site over.

The line worth keeping a hand on is OAI-SearchBot. It is the crawler behind ChatGPT Search; when ChatGPT goes looking for fresh data on the open web, this is what it sends. The Markdown lean is a few points right now, but not enough to be statistically confident.

Everyone else — ChatGPT-User, Perplexity, Claude — sits roughly flat. Markdown and HTML get reached at about the same rate. That makes sense: these systems chase the user's question, not the page's format. The flat line is the result here. At conversation time, what your page is about matters more than how you serve it.

## Live retrieval follows the question, not the format

ChatGPT-User is not a crawler on a schedule. It opens a page mid-conversation because someone asked ChatGPT something and the model needed a real page to answer. Across 28,000 of these live fetches on the site, demand tracks the topics people actually ask about, and it spreads roughly evenly across both arms of the experiment.

| Category | Live fetches over 7 days |
|---|---|
| ai-transcription | 1,603 |
| automation | 1,310 |
| ai-image | 1,269 |
| vpn | 1,183 |
| payment-processing | 1,042 |

Sanity check: ChatGPT-User reached 76.7% of Markdown-assigned pages and 77.0% of HTML-assigned pages. A 0.3% difference (p=0.859). So the category gaps above are about what people asked, not which arm got more traffic.

## Why Markdown might pull ahead anyway

OAI-SearchBot's Markdown lean is small for now. But here is the structural reason that line could grow. Strip the nav, scripts, tracking pixels, and CSS chrome off a typical HTML page and what is left is just the answer. The Markdown version of the same page is roughly a quarter of the size. Cheaper to fetch, faster to parse. The trend is already being acknowledged at the infrastructure level: [Cloudflare launched Markdown for Agents](https://blog.cloudflare.com/markdown-for-agents/) — serving Markdown versions of pages to AI agents for token efficiency.

| Format | Payload share |
|---|---|
| Markdown | 24% |
| HTML | 100% |
| Difference | −75.9% |

Over the experiment we served 4,745 fetches as Markdown and 5,322 as HTML across the randomized URLs. Total saving — **32.7%** of bytes versus serving everything as HTML. If retrieval cost ever becomes a routing signal inside the labs — and in at least one, by our read, it already has — that gap stops being free.

## Methodology

Every measurement is built from a single snapshot. If the prose says something the data does not back, the snapshot is what is true. Here is how it works and what it cannot yet say.

- **Surface locking.** Each eligible page is bound to Markdown or HTML by a hash of its URL: same address, same surface, every time. A crawler cannot end up seeing both versions.
- **The headline metric is page-level coverage.** Of all pages assigned to a variant, what share each crawler actually reached. We track request counts and bytes transferred too, but treat them as secondary: a few popular URLs can dominate raw volume.
- **Statistics.** Differences between Markdown and HTML are tested with a two-proportion z-test and reported as percentage-point gaps with p-values.
- **ChatGPT-User is reported separately** from OAI-SearchBot: it is user-triggered live retrieval, not background indexing. A different signal entirely.

What the measurement cannot yet say. Bots are identified by their user-agent string — Cloudflare's verified-bot signal is not in the data yet, so a determined spoofer could be miscounted. And we measure whether a crawler fetches a page, not whether the AI ended up citing it. Linked questions, but not the same one.

## What to take from the experiment

Format is hygiene, not a lever. Four of the five crawlers show no real preference — their shifts sit inside the noise band. Only GPTBot picked a side, leaning HTML by −29.4%, and GPTBot is a training scraper: it feeds future model versions, not the decision about who gets cited today. Keep Markdown — it does no harm, saves up to a third of bytes, and clears the noise around the answer. But counting on a single format to raise citation rates is solving the problem at the wrong level.

The lever is not page markup but citation infrastructure: whether the brand sits in the sources the model relies on for the buyer's scenario. That is the layer GolOps takes under management. We measure a company's position in the field of choice through the Choice Control Index, attribute the result to specific sources and scenarios, and turn that into a prioritized plan. The Strategic Pilot delivers the first cycle in 10–12 weeks; the Command Center keeps it running across seven AI systems.

The cost of getting this wrong is easy to price. Gartner forecasts 90% of B2B procurement under autonomous AI agents by 2028, and Semrush already shows AI-channel conversion running 4.4× higher than organic search. While a company chases format, the shortlist forms on a different rule — and every quarter spent on markup costs a place on that shortlist.

**Markdown is not the only technical lever that promises a citation advantage and fails to deliver one:**

[**The llms.txt Effect: 37,894 Domains, Zero Citation Advantage**](/en/publications/llms-txt-effect)

[Request an index diagnostic →](https://golops.io/en/position) · [Discuss a pilot →](https://golops.io/en/pilot)
