Microsoft's New AI Citation Report: Useful, But Missing 80% of the Picture
In February 2026, Microsoft shipped something the AEO space had been asking for: first-party data showing which of your pages Copilot used as grounding sources when generating AI answers. The AI Performance report in Bing Webmaster Tools is free, available to any verified site owner, and returns real citation counts per URL over time.
It is the first tool of its kind from any AI provider. Understanding exactly what it measures -- and what it does not -- determines how much of the picture it actually gives you.
What the Bing AI Performance report actually shows
The report, launched in public preview on February 10-11, 2026, gives verified Bing Webmaster Tools users three things: which specific URLs on their domain were used as grounding sources by Microsoft Copilot; the grounding queries -- internal sub-queries Copilot generated when retrieving source content; and citation counts per URL over time.
"Grounding source" has a specific technical meaning. When Copilot builds an answer, it retrieves content from pages that inform what it says. Those pages are grounding sources. Being a grounding source means Copilot read your content and drew on it.
What it does not mean: that your brand was named in the answer. You can be a grounding source without Copilot mentioning your business at all. The report measures backend retrieval activity, not in-answer brand presence.
This distinction matters in practice. Most AI visibility questions businesses actually care about -- "does AI recommend me," "is AI sending customers to my competitors instead of me," "what does AI say about my category" -- are questions about brand mention behavior. The Bing tool gives you the retrieval layer. It does not touch the mention layer.
The coverage gap: one platform out of five
Our June 2026 methodology rec on the Bing tool (2026-06-05-bing-ai-performance-fix-plan-addendum.md, session 38) places Copilot at roughly 10-20% of current AI search activity. ChatGPT, Perplexity, Claude, and Google AI Mode account for the remaining 80-90%, and none of them appear in the report.
That gap is not just a missing 80% of volume. Each platform has structurally different source preferences, retrieval architectures, and citation tendencies. The same business can score radically differently across platforms for reasons that have nothing to do with content quality.
In our April 2026 cross-platform audit (edge-cases/llm-cross-recommendation-bias-2026-04-25.md), we ran signal checks on the same set of domains across all four platforms in our panel and recorded per-platform scores independently. For openai.com, Gemini returned a score of 8.3 while Claude returned 1.6 -- a 6.7-point spread on the same domain. For perplexity.ai, ChatGPT and Gemini both scored 4.5 while Claude scored 7.8. Different platforms were reading the same content and arriving at opposite recommendations.
These are not small variations around an average. They reflect genuinely different behaviors: different source pools, different query patterns, and in some cases training-level tendencies that no content change can correct. Knowing how Copilot retrieves your pages tells you nothing reliable about what ChatGPT cites, what Perplexity surfaces in category queries, or how Claude describes your business to users.
Grounding is not the only thing AI is doing with your brand
There is a second gap the Bing tool cannot address, and it became visible in how ChatGPT's behavior changed in May 2026.
Our methodology rec filed 2026-05-29 (chatgpt-linked-vs-unlinked-mentions.md, session 31) documented a behavior shift in ChatGPT that started on May 7. Before that date, the rate at which ChatGPT embedded inline links to brand homepages within answers was approximately 0.4%. After May 7, it rose to 6.2% -- a 14x overnight increase based on qwairy.co's analysis of 140,000+ ChatGPT answers. Among answers that named a brand, the share that included a direct homepage link jumped from 2% to 29%.
This created a meaningful new distinction in what "being cited" means on ChatGPT. An unlinked mention builds brand awareness -- the user sees your name. A linked mention generates direct referral traffic, and click behavior from those links may feed back into OpenAI's recommendation ranking. A brand that earns linked mentions and subsequent clicks is presumably reinforced in the model's behavior. A brand mentioned without a link is not generating that signal.
The Bing AI Performance report cannot see any of this. It measures what Copilot retrieves, not what ChatGPT presents. A business could have strong Copilot grounding activity and zero ChatGPT linked mentions -- or the reverse. These are different surfaces, different platforms, different measurement requirements.
Where the Bing tool earns its place
None of this makes the report useless. For site owners who want to understand their Copilot and Bing AI presence specifically, the AI Performance report provides something no third-party tool can: direct confirmation from Microsoft that specific pages are being retrieved for answer generation. That is real signal, and it is free.
It is most useful for technical validation: confirming that recent content is being indexed and retrieved by Copilot, identifying which service pages get used versus which are invisible to it, and checking whether changes to page content affected retrieval behavior over time. The grounding query data is particularly useful -- it shows you the internal sub-questions Copilot was asking when it pulled your content, which can inform how you frame and structure pages.
The limitations to be aware of: no click-through data, no view of who else Copilot cited for the same query (so no competitive context), no API access as of February 2026, and no coverage outside Microsoft's AI surfaces. The report covers roughly 10-20% of where AI-driven recommendations actually happen.
We recommend it to clients as a progress-tracking tool between audits -- one real monitoring signal for one real platform. The framing matters: this covers Copilot specifically. Use it alongside cross-platform audit data, not instead of it.
What cross-platform audit data covers that this does not
A complete picture of AI search visibility requires testing what each platform actually says in response to the queries your potential customers run. Not URL retrieval logs -- answer-level query testing.
That means running direct brand recognition queries, category recommendation queries, and evaluation-stage queries across ChatGPT, Perplexity, Gemini, and Claude. It means classifying each response: did the platform name your business, recommend it, or ignore it? It means checking accuracy -- whether the platform describes your business correctly or introduces errors that could mislead customers. And it means doing this across all four platforms simultaneously, because a strong Perplexity score and a weak ChatGPT score require entirely different fix sequences.
The Bing Webmaster Tools AI Performance report contributes a fifth data point, for one platform, on the retrieval layer. It does not cover the other four platforms, the mention layer on any platform, or the accuracy of what any AI says about your business.
Signal Check at sourcepull.ca runs the cross-platform query set and returns per-platform scores with accuracy classification, typically in under three minutes. The Bing AI Performance report is worth setting up -- it is free, first-party, and useful for tracking Copilot-specific retrieval. But it is a starting point, not a complete answer to whether AI is recommending your business.
See how your business scores on AI platforms.
Check your score — free