Analysis · 7 min read · 2026-05-24

43% of Relevant Pages Get Zero AI Citations. Generic Fixes Miss the Point.

Most AI visibility advice follows the same pattern: add schema, improve your content, get more directory listings, earn backlinks. Apply broadly. Repeat. The implicit model is that AI visibility is a general authority problem -- build enough of the right signals and citations will follow.

A March 2026 paper from Tian, Chen, Tang, Liu, Jia et al. (arXiv:2603.09296) put a number on how often that model fails: **43% of topically relevant webpages receive no AI citation under baseline conditions.**

These are not irrelevant pages. They're relevant to the query. The failure is somewhere in the pipeline between retrieval and generation -- and the paper's central finding is that where in that pipeline a page fails determines what fix actually works.

The 43% problem isn't about relevance

In our May 24, 2026 investigation of GEO citation failure research (knowledge/geo-citation-failure-taxonomy.md, session 27), we documented the paper in detail. Its framing is worth stating precisely: existing GEO methods measure contribution (does page X appear?) without diagnosing citation failure (does page X fail to appear because of a correctable, specific cause?).

The gap matters because the fixes are different depending on where in the pipeline a page drops out. A page that fails at retrieval -- that never makes it into the platform's source pool at all -- needs different intervention than a page that gets retrieved but is ignored during answer generation. Generic content improvements applied to a page with a retrieval problem will produce no improvement. The page was never the issue.

The 43% figure is the scale of the problem for relevant pages. It's not 5%. It's not a fringe case. Nearly half of pages that are topically on-target for a query are receiving no citation -- and most of them are failing for diagnosable, correctable reasons that have nothing to do with their content quality.

Targeted repair outperforms generic rewriting by a meaningful margin

The same paper introduces AgentGEO, a diagnostic system that identifies which failure mode applies to a specific document, selects a targeted repair from a corresponding set of interventions, and iterates until citation is achieved.

The results: **40% relative improvement in citation rates while modifying only 5% of content.**

Baseline generic rewriting methods -- the kind that apply uniform content improvements without diagnosing the failure mode -- achieve around 25% improvement, with far more content modification required.

That 15-point gap comes entirely from the diagnosis step. Not from better content recommendations. Not from more comprehensive schema. From identifying which specific failure mode applies to this specific page before deciding what to fix.

In our investigation, we noted the direct competitive implication: most AEO tools in the market offer generic content recommendations -- comprehensive rewriting, content gap analysis, keyword enrichment. The paper identifies that approach as the baseline, and the targeted diagnosis model as the improvement. The gap is not a marginal refinement. It's 40% versus 25% -- and the targeted approach requires less content change to get there.

Being cited isn't the same as being absorbed

A second paper, arXiv:2604.25707v2 (April 2026), "From Citation Selection to Citation Absorption," adds a framework that changes how you interpret citation scores.

In our May 24, 2026 methodology rec (2026-05-24-citation-selection-vs-absorption.md), we translated the paper's framework into Sourcepull's scoring context. The paper analyzed 602 controlled prompts across ChatGPT, Gemini, and Perplexity, producing 21,143 valid citations and 23,745 citation-level feature records. The core finding: **citation breadth and citation depth diverge.**

The paper defines two stages:

**Citation selection:** The platform retrieves a source and includes it in the response. The page appears in the citation list.

**Citation absorption:** The cited page actually contributes language, evidence, or factual content to the generated answer. Being in the citation list doesn't mean the page shaped the response.

A page can be selected -- appear as a citation -- without being absorbed. The AI acknowledges the source but draws its answer from elsewhere.

For businesses, this means a citation count is incomplete information. The relevant question is not just "am I appearing?" but "when I appear, does the AI actually use accurate information about me?"

Two failure profiles that require different fixes

Our May 2026 methodology rec frames this as two distinct client profiles.

**High-selection / low-absorption:** The business appears across many AI queries, but the AI consistently gets facts wrong -- wrong category, wrong location, wrong service description. This is the misattribution case. Being selected but absorbed poorly is exactly what misattribution looks like: the AI found the business but drew on incorrect data. The fix isn't more directory listings. It's correcting the bad data in the places the AI is actually absorbing from -- directories with wrong NAP, a Wikidata entry that doesn't exist or contains errors, competitor descriptions that have been grafted onto the business entity.

**Low-selection:** The business rarely appears regardless of how accurate its directory data is. The failure is earlier in the pipeline -- the platform can't retrieve anything. The fix here is the footprint: external indexed presence, crawlability, directory presence in the sources each platform actually reads for that category.

Applying the first-profile fix to a second-profile business -- cleaning up Wikidata and fixing NAP when the real problem is zero external presence -- produces no movement. Applying the second-profile fix to the first-profile business -- building more directory listings -- adds selection without addressing the accuracy failure. The business appears more often, incorrectly.

What this looks like in Sourcepull's audit data

In our April 2026 investigation of Perplexity citation patterns (knowledge/perplexity-citation-triggers.md, session 2), we documented what distinguishes Perplexity-cited businesses from those that score zero in category queries.

Businesses that Perplexity cited consistently -- appearing in all four B-series category queries -- shared specific traits: a named founder or contact person, a founding year, quantified outcomes (client count, project count), a specific location, and at least one external reference on a high-trust domain that corroborated the on-site claims.

Businesses with zero category scores, like civicengagement.ca (1.8/10 aggregate, zero Perplexity citations), had the opposite profile: no external indexed presence. Perplexity stated directly that "no specific website or organization named civicengagement.ca appears in available search results." The platform couldn't retrieve anything because there was nothing external to retrieve. No amount of on-site content improvement would have moved that score. The gap was a footprint gap, not a content gap.

The dependency chain for Perplexity's category queries is concrete: Google organic ranking for the category + city query feeds approximately 60% of what Perplexity cites for category searches. Businesses not in Google's organic top 10 for their category terms are rarely appearing in Perplexity's B-series results. That's a different problem from misattribution, and it requires a different sequence of fixes -- one that starts with getting ranked, not with improving entity descriptions.

The case for diagnosis before recommendation

The 43%/40%/25% finding from arXiv:2603.09296 makes the argument empirically: diagnosing which failure mode applies before recommending a fix produces meaningfully better outcomes than applying generic improvements uniformly. The failure taxonomy matters because the same visible symptom -- low AI visibility score -- can stem from failure at retrieval, failure at generation, or failure at absorption. Each has a different cause and a different fix.

Generic content rewriting addresses one possible cause. Targeted diagnosis addresses the actual cause. The paper shows the difference in citation improvement rates.

Sourcepull's audit builds the per-platform, per-query breakdown that makes the failure mode visible -- whether a low score reflects a footprint gap, a misattribution pattern, a structural crawl block, or an absorption failure. The Signal Check at sourcepull.ca shows the score snapshot first: if category queries are returning zero results for your business, that's the starting question for understanding where in the pipeline the failure is happening.

See how your business scores on AI platforms.

Check your score — free