All posts
Analysis · 7 min read · 2026-06-04

Your AEO Score Might Be Measuring the Wrong Thing

An AEO score is only useful if it's measuring what matters. Over the past two months, our methodology research identified three systematic gaps in how most audits — including earlier versions of our own pipeline — score AI visibility. Each one produces misleading numbers in a different way. All three showed up in real client audits before we caught them.

The category mislabeling trap

Every AEO audit starts by generating queries based on what category a business competes in. Get the category wrong, and the entire audit scores visibility against a landscape that isn't yours.

In our May 30, 2026 methodology review, we documented three confirmed cases where this happened. CoLab Education described itself as "edtech SaaS" — but the product is educator networking, so our query generation compared it against Coursera, Canvas, and Duolingo rather than teacher community platforms. Race Data called itself a "data analytics company" — but it's a B2B database marketing agency, so the competitive set pulled in Snowflake and BigQuery instead of agency alternatives. Jupitrr self-described as "video production" when it operates as an AI video OS — placing it in a tier with Adobe and DaVinci Resolve rather than its actual peers.

All three returned Category Authority scores at or near 0.0. Not because those businesses are invisible in AI answers — but because they were being measured against competitors they're not actually competing with.

The root cause is straightforward: most audits use either the client's own self-description or a category inferred from the domain. Neither gets validated against the competitive landscape that AI models actually use when someone asks about that space. A niche SaaS company that calls itself "analytics" may look completely invisible on paper while having real presence in the queries that matter to its actual buyers.

This has to be caught before scoring, not after. We now run a category validation step that checks whether the submitted label maps to the real AI competitive landscape. When it doesn't, we reframe before queries are generated. If your audit doesn't include this step, your Category Authority number may be measuring a category you're not in.

Being cited and being cited correctly are different things

Even with the right category, there's a second failure mode: showing up in AI answers but getting described inaccurately.

In our May 24, 2026 methodology review, we incorporated a framework from a recent academic paper (arXiv:2604.25707v2) that distinguishes two stages of how AI models handle sources. The first is *selection* — whether a business appears in the retrieval set at all. The second is *absorption* — whether the retrieved source actually shapes the language and facts in the AI's answer.

Standard AEO audits track selection. They count how often a business appears in AI answers and treat that as the visibility score. But a business can score well on selection while failing on absorption: it gets mentioned frequently, but the AI consistently describes it wrong — wrong location, outdated services, invented attributes, or a generic description that could apply to any competitor in the category.

This isn't a hypothetical pattern. It's the mechanism behind most of the misattribution cases we've audited. The business is known enough to be retrieved, but the source material is thin or contradictory, so the AI reconstructs the description from conflicting signals rather than absorbing clean information from a reliable source.

The selection/absorption distinction changes the fix strategy entirely. A business with low selection — rarely appearing in answers — needs more external presence: directories, press, structured citations that expand its footprint. A business with low absorption — appearing but always described wrong — needs better source material: clear, specific, consistent content the model can actually absorb. Optimizing for selection when the real problem is absorption means spending effort in the wrong place.

We're now using this framing explicitly when explaining results that show decent mention counts alongside garbled descriptions. The score looks fine; the problem is downstream of the score.

ChatGPT's link behavior changed in May — and audits haven't adjusted

The third gap is more recent and more specific to one platform.

Our May 29, 2026 methodology review flagged a ChatGPT behavior change that started around May 7, 2026: ChatGPT began inserting organic inline hyperlinks to brand homepages directly in answers. The link rate jumped from approximately 0.4% of answers to 6.2% — a 14x increase. Among answers that mentioned a brand by name, the share that included a direct link went from roughly 2% to 29%.

These links carry utm_source=chatgpt.com and generate traceable referral traffic. More consequentially, linked mentions now feed into OpenAI's recommendation ranking model differently than unlinked bare-name mentions. Getting cited with a link is a meaningfully stronger signal than getting cited without one.

The problem for audits: most citation-tracking tools — including snapshot-based tools that query ChatGPT through the API rather than capturing browser-rendered output — don't distinguish between linked and unlinked mentions. A score that reports "ChatGPT mentions you in 40% of relevant queries" treats both types as equivalent. They're not equivalent anymore.

If your audit shows strong ChatGPT numbers but you're not seeing referral traffic from chatgpt.com in your analytics, that gap is worth investigating. It may mean your mentions are unlinked — appearing in answers but not generating the signal that feeds downstream recommendation ranking.

We're tracking this as an active methodology gap. The implementation challenge is real — capturing linked versus unlinked consistently requires browser-rendered response capture, not API output — but the distinction is now material enough that audits ignoring it are understating the gap between strong and weak ChatGPT visibility.

What to do before you act on a score

None of these gaps mean AEO audits aren't useful. They mean the score needs context before you treat it as ground truth.

Before building a fix plan from an AEO score, check whether the category framing matches how AI models actually describe your competitive space — not just how you describe yourself. If your audit returned a low Category Authority score, find out what the comparison set was.

If your mention counts look reasonable but your AI descriptions are consistently wrong, the priority is source quality and entity signal, not more directory listings.

And if you're using ChatGPT citation numbers to benchmark progress, find out whether your tool distinguishes linked from unlinked mentions. If it doesn't, the number is missing something real as of May 2026.

Signal Check at sourcepull.ca runs a free scan that surfaces where your citations are coming from, which platforms are active, and whether your descriptions match your own language. For the full diagnostic — including category validation and the selection/absorption breakdown — the paid audit covers each of these gaps in detail.

See how your business scores on AI platforms.

Check your score — free