All posts
Analysis · 7 min read · 2026-06-27

44% of ChatGPT Citations Come From the First Third of the Page

Most of the advice on getting cited by AI engines focuses on what you write -- which questions to answer, what schema to add, how many words per FAQ entry. A study published in March 2026 shifts the frame: it's also about where on the page the answer appears.

Kevin Indig (Growth Memo, March 23, 2026) analyzed 3 million ChatGPT responses and extracted 30 million citations, then verified 18,012 for positional data. The finding: 44.2% of all ChatGPT citations originate from the first 30% of page content. 31.1% come from the middle 40%. 24.7% from the last 30%. P-value: 0.0.

We documented this in our June 25, 2026 update to `knowledge/content-format-citation-signals.md` (Scout session 58). It changes how we sequence content recommendations in Sourcepull fix plans.

Why citation probability front-loads

The "ski ramp" shape -- Indig's label -- describes a curve that's steep at the top and tapers toward the bottom. ChatGPT retrieves a page and forms its synthesis from early content first. The mechanism Indig identifies: LLMs are trained heavily on journalism and academic writing, both of which follow "bottom line up front" conventions. The model establishes its interpretive frame from the opening, then reads later content through that lens.

A page that leads with company history before answering anything has placed its most informative content in the 24.7% citation probability window, not the 44.2% window.

Indig's data also quantifies the ranking-to-citation gap. Among pages ranking #1 in Google, 43.2% were cited by ChatGPT -- 3.5x higher than pages outside the top 20. That's still a minority of top-ranked pages. Organic ranking creates the retrieval prerequisite; what's in the first third determines whether that retrieval becomes a citation.

Structure alone moved citation rates 17%

The ski ramp finding fits a pattern we first documented in our June 18, 2026 knowledge update (Scout session 51, `knowledge/content-format-citation-signals.md`).

In March 2026, researchers Yu, Yang, Ding, and Sato published "Structural Feature Engineering for Generative Engine Optimization" (arXiv:2603.29979), measuring how document structure -- independent of content quality -- affects AI citation rates. Their result: structural changes alone lifted citation rates by 17.3% on average across six mainstream AI search engines without modifying the underlying content. Subjective answer quality improved 18.5% in human evaluation through structural changes alone.

The framework identifies three levels:

**Macro-structure:** Document architecture -- how the page opens, how sections flow. A service page that leads with service category, city, and a specific claim is better macro-structured than one that opens with a company overview.

**Meso-structure:** Information chunking -- paragraph length, whether comparisons appear in tables or embedded in prose. The GEO-SFE study found single-claim paragraphs and comparison tables produced the strongest meso-level citation improvements. AI systems parse tables as discrete extraction units; a comparison grid can be lifted without touching the surrounding prose.

**Micro-structure:** Bold claims, lists, visual emphasis. Useful, but the smallest driver of the three.

The 17.3% figure isolates structure from content: same information, different organization, materially different citation rate. Most content rewrites don't produce 17 percentage points of movement.

The definition-first rule

The ski ramp study makes the positional mechanism specific enough to write a rule from it.

Every section opener -- FAQ answer, H2 section, service page introduction -- should begin with a standalone answer to the implied question. Not context. Not acknowledgment of the question. The answer.

The difference matters at the sentence level. For an emergency plumbing page:

**Buried answer:** "At Smith Plumbing, we understand that burst pipes are one of the most stressful situations a homeowner can face. Our team of licensed plumbers is available around the clock to help."

**Definition-first:** "Emergency plumbers typically arrive within 60-90 minutes for burst pipe calls in Hamilton. Smith Plumbing dispatches from downtown and covers Hamilton and surrounding areas."

The first version reaches useful content at sentence three. Under the ski ramp distribution, content positioned later in the section is in a lower-probability citation window. The second version puts specific information -- arrival time, city, business name -- in sentences one and two, within the highest-density citation window.

RAG systems frequently extract only the first sentence or first paragraph of a retrieved chunk. If sentence one contains no answer, the extracted chunk contains no answer, and the source isn't cited -- regardless of how strong the rest of the page is.

How other platforms handle position

The ski ramp data is specific to ChatGPT. Our June 18, 2026 update (Scout session 51) documents related patterns across other platforms.

For Claude, the standard appears tighter at the paragraph level. Internal audit observations in `knowledge/platform-citation-behaviors.md` document Claude's extraction behavior: "A page with strong overall quality but poorly structured individual paragraphs may be read but not cited. Claude's extractor wants the key answer surfaced in sentence one of a paragraph." The positional gradient in ChatGPT data becomes a paragraph-level hard requirement for Claude.

For Perplexity, the mechanism is live RAG retrieval rather than positional weighting across a page. But the practical implication is similar. Our June 20, 2026 methodology rec on Perplexity fix plan ordering (`methodology-recs/2026-06-20-perplexity-two-bar-fix-plan-section.md`, Scout session 53) documents that Perplexity "extracts candidate citations from paragraph openers, not from buried body text." A source that clears Perplexity's retrieval bar (directory presence, Google organic ranking) still needs its content in the paragraph opener to clear the answer absorption bar.

Across platforms, front-loaded and directly answered content is extracted more reliably than content where the answer appears after framing or context.

The long-content question

A claim that circulates in AEO practitioner content: long-form pages get cited more. The ski ramp data is the clearest rebuttal we've found.

Citation probability is front-loaded, not length-proportional. A 300-word page that opens with a direct answer and specific vocabulary can outperform a 2,000-word page that buries the answer in the third section. What's in the first 30% of a page determines citation probability more than total word count.

Indig's data adds a domain concentration finding: roughly 30 domains own 67% of citations in any topic area. High-citation-share domains front-load their answers as a consistent structural feature. The concentration isn't purely domain authority -- it reflects an accumulated structural pattern in how these pages are organized.

For local businesses competing against directory citations from Yelp, Angi, or BBB, this is a useful reference point. Those directory pages are structured for extraction: business name, category, and rating appear first. A business's own page needs to match that structural density in the first third, not necessarily outrank the directories in volume.

What to revise first

The positional finding, combined with the definition-first rule and the GEO-SFE structural evidence, points to a specific Phase 2 revision order:

Service page opening paragraphs: state the service, the city, and one specific claim in sentences 1-2.

FAQ answers: direct answer in sentence 1, supporting specifics in sentences 2-3. Not "we believe..." or "great question."

About and introduction sections: lead with what the business does and where it operates. Brand history and founding story can appear lower on the page.

These are structural changes, not content rewrites. The underlying information doesn't change -- its position within the page hierarchy does. The GEO-SFE study puts a number on what that repositioning produces: 17.3% more citations on average, across six platforms, without changing what the page says.

Signal Check at sourcepull.ca runs a cross-platform citation test across ChatGPT, Perplexity, Gemini, and Claude and shows per-platform scores with a prioritized fix plan. For businesses that have directory presence and some citation activity but want to increase frequency, the Phase 2 content layer applies the structural principles above -- specific to the pages and platforms where the gaps appear.

See how your business scores on AI platforms.

Check your score — free