All posts
Analysis · 6 min read · 2026-05-15

Why Your AI Visibility Score Doesn't Tell You What to Fix

Getting a low AI visibility score is easy. Most businesses get one the first time they check. What's hard is knowing what to do about it.

The AEO measurement space in 2026 has converged on polling-based visibility rates: run 250-500 queries against your brand, average the results, report a percentage. The tools that do this well give you a statistically stable number -- your visibility rate this month vs. last month, your share of voice vs. competitors.

That number is useful for one thing: monitoring whether you're improving after making changes. It is not useful for figuring out what to change.

Monitoring tools and diagnostic tools are different instruments

In our 2026-05-14 methodology investigation of the multi-sampling standard that has become the AEO category norm, we documented the exact framing the industry has adopted. Search Engine Land's LLM optimization tracking coverage stated it this way:

> "AI answers are probabilistic, and running the same prompt twice may yield different citations. Any tool that doesn't account for this by running multiple passes and averaging the results is presenting noise as signal."

The polling model is designed for statistical reliability. Run the same queries many times, smooth out the variance, get a stable rate. This is the right approach for:

- Detecting visibility drops after a model behavior change - Comparing month-over-month improvement after implementing a fix - Reporting share of voice to stakeholders who need a consistent metric - Monitoring multiple brands across a competitive category

What it cannot do: identify which failure mode is causing a low rate. A 3% visibility rate from 500 runs tells you with high confidence that you appear in 3% of those queries. It does not tell you whether that's because of entity confusion, content absence, category-page exclusion, or platform-specific crawl issues. Those failures each require a different fix. Knowing the score doesn't choose between them.

Our May 14 research used the hospital framing: a daily temperature reading tells you the patient is sick. It does not tell you what's wrong or what to prescribe. The monitoring rate is the temperature. The diagnosis is the blood test.

Why query repetition misses what query diversity finds

Our 2026-05-14 methodology investigation made this distinction explicit: "Running 'What is [Brand]?' three times tells you whether the brand query is stable. Running 'What is [Brand]?' once AND 'Who should use [Brand]?' AND 'How much does [Brand] charge?' tells you which of the three query types is failing -- and they often fail for different reasons."

Brand queries test entity recognition: does the model know who you are? Category queries test recommendation authority: does the model cite you when someone is shopping for what you sell? Service queries test content specificity: does the model have enough detail about your offerings to surface you in narrow, need-based searches?

A business can have strong brand recognition and near-zero category authority. These are different failure modes with different fixes that sit in different parts of the AI infrastructure. Running one query type 250 times gives you a stable rate for that query type. Running 12 brand queries, 12 category queries, and 15 service queries once each tells you which dimension is failing.

For diagnostic purposes, query diversity beats repetition. For trend monitoring, repetition beats diversity. The AEO space has built tools optimized for the second use case. The first is still underserved.

What source-level analysis actually surfaces

Abstract failure mode identification is necessary but not sufficient. The fix requires knowing not just that you're absent from category queries, but which pages Perplexity is reading when it answers those queries -- and whether you appear in any of them.

In our 2026-05-13 platform citation analysis (session 16), we ran the first systematic review of actual source URLs returned per query type across a live Sourcepull audit -- 102 Perplexity source URLs across brand, category, and service queries. The findings were specific in a way a visibility percentage cannot be.

For category queries, Perplexity cited four domains above all others: tryprofound.com, meltwater.com, visible.seranking.com, and aeoaudittool.com. That last site is a small independent tool directory with no meaningful domain authority by traditional SEO metrics. It appeared in 3 of 4 category queries -- identical frequency to a well-funded competitor with a full content program.

Our 2026-05-14 investigation into why identified the mechanism: the domain name is an exact keyword match for the query concept. Perplexity's retrieval layer weights pages where the URL and title directly signal category relevance. From our investigation: "A page titled 'best AEO audit tools' on a $9/year domain can outperform an established media site for that exact category query if the domain and title are direct keyword matches."

This changes what the fix looks like entirely. The brand's 0.0/10 category score on Perplexity is a symptom. The source analysis tells you the actual diagnosis: the brand is absent from the specific pages Perplexity synthesizes from when answering the category query. The fix is not "build more domain authority." It is "get listed in the comparison directories Perplexity is actually reading -- starting with the ones that already appear in your audit's source data."

A monitoring tool tracks the 0.0/10 score quarter over quarter. The source analysis tells you exactly what would change it.

The sequence matters

Monitoring and diagnosis are not competing methodologies. They belong in sequence.

A diagnostic audit runs first: identify the specific failure mode per platform, find which pages and directories matter for your category, and build a fix plan with concrete actions -- not just "get more links" but "get listed on these specific pages because Perplexity is reading them for your category queries."

Monitoring takes over after: track whether the visibility rate improves as you implement fixes, detect unexpected drops, report progress over time.

Skipping the diagnostic step and going straight to monitoring gives you a stable number with no theory about what's moving it. The number can improve without you knowing why (fragile) or stay flat despite real effort (because the effort targeted the wrong failure mode).

Most businesses skip the diagnostic step because monitoring scores feel actionable. A percentage next to your brand name looks like a problem you can solve. But a percentage without a failure mode diagnosis is not an action item. It tells you that something is broken. It does not tell you which thing.

A Sourcepull audit is built as a diagnostic instrument: 39 diverse queries across 4 platforms, source URL analysis of Perplexity category behavior, and a fix plan that names specific directories and pages rather than general recommendations. If you want a faster first look before deciding whether a full diagnostic makes sense, the Signal Check at sourcepull.ca scores your AI presence across all four platforms in about 60 seconds and flags the most obvious gaps.

See how your business scores on AI platforms.

Check your score — free