Deep Dive · 6 min read · 2026-06-07

Why Generic Schema Won't Help Your AI Citations

For the past several months, we've been tracking two pieces of research on schema markup and AI citations. They appear to contradict each other at first glance -- and the contradiction is worth unpacking because the reconciliation changes what we tell clients.

One study shows that adding schema to a web page doesn't increase how often AI platforms cite it. The second shows that pages with attribute-rich schema are cited at meaningfully higher rates than pages with generic CMS-default schema. Both findings are real. Both are defensible. They measure different things.

The Ahrefs study: adding schema doesn't help already-cited pages

We covered this in depth in our May 2026 session 26 research, documented in our schema-markup-effects knowledge file. The Ahrefs study tracked 1,885 pages that added JSON-LD schema markup against 4,000 control pages, using difference-in-differences analysis -- the first controlled study on schema and AI citation frequency.

Citation lift on Google AI Mode was +2.4% (not statistically significant). On ChatGPT, +2.2% (not significant). On Google AI Overviews, -4.6% (statistically significant, wrong direction). No platform showed meaningful citation uplift from adding schema.

The constraint that defines what this study actually measured: every treated page already had 100 or more AI citations before the intervention. These pages were already in AI consideration sets -- crawled, known, cited. The study tests whether adding schema pushes already-visible pages higher. It does not test what happens for a business with essentially zero AI presence.

The Fischman study: schema quality creates a 22-point gap

In February 2026, Kurt Fischman published a correlational study on SSRN (abstract ID 6284518) analyzing 730 AI citations across platforms. Where the Ahrefs study asked "does adding schema increase citations?", Fischman asked a different question: among pages that already have schema, do citation rates differ by implementation quality?

We documented the findings in our June 7, 2026 update to the schema-markup-effects research file. The results for lower-authority domains (domain rating below 60):

| Schema implementation | Citation rate | DR range | |---|---|---| | Generic schema (CMS-default Article/Organization) | 31.8% | DR < 60 | | Attribute-rich Product/Review schema | 54.2% | DR < 60 | | Any schema type | Narrows toward parity | DR > 75 |

For high-authority domains (DR above 75), schema type differences largely disappear. Authority signals dominate citation decisions at that level, and schema quality becomes irrelevant. For lower-authority sites -- which is where essentially every local business and early-stage SaaS product lives -- the 22-point gap between generic and attribute-rich implementations is significant.

The Ahrefs sample only included pages with 100+ pre-existing citations. By definition, those pages had already crossed whatever threshold brings them into AI consideration. Fischman's dataset includes lower-authority sites across the full range of citation presence. The two studies are sampling different populations.

Why the studies are compatible, not contradictory

Ahrefs: does schema addition move the needle for a page AI systems already know about? Answer: no. Once a page is in an AI consideration set, adding schema doesn't push it further.

Fischman: among pages that have schema, does implementation quality affect citation rates at lower authority levels? Answer: yes, substantially. Generic CMS-default schema carries no measurable citation rate advantage. Attribute-rich, domain-specific schema that populates concrete fields correlates with a 22-point higher citation rate.

The practical read: if your business is already appearing in ChatGPT or Perplexity answers regularly, improving your schema won't increase that frequency. If your business has essentially no AI citation presence yet, how well you implement schema correlates with whether you'll be cited at all.

This is consistent with the framing from our methodology rec filed in session 26 (2026-05-23): schema is entity establishment infrastructure, not a citation frequency lever. The Fischman data adds a layer to that -- it's entity establishment infrastructure that only functions well when it's actually populated.

What "attribute-rich" means in practice

The category that underperforms in the Fischman data isn't the wrong schema type. It's schema that's present but unpopulated. CMS-default Organization and Article schema ships with whatever the platform could auto-infer: a name pulled from the `<title>` tag, a description from the meta description, maybe a URL. The specific, meaning-carrying fields stay empty or get filled with placeholder text.

From our session 26 methodology rec: the `sameAs` property is where most SMB implementations fail. A `sameAs` array pointing to directories where you have no verified profile sends no useful signal. Each URL in `sameAs` needs to resolve to a live, accurate listing using your canonical business name -- not just the property name in the JSON-LD.

For a local business, attribute-rich `LocalBusiness` schema means real values in `name`, `address`, `telephone`, `url`, `openingHours`, and a `sameAs` array with verified directory profiles. The schema type matters less than whether the fields contain actual data.

Entity consistency is part of the same problem

In our June 5, 2026 update to the wikidata-entity-disambiguation research file (originally developed in session 3, updated through session 9), we tracked the mechanism behind why schema quality matters beyond just field completeness.

AI systems don't resolve your business from a single source. They reconcile signals from multiple sources and assign a confidence score to the entity match. If your `LocalBusiness` schema says "Smith HVAC & Plumbing" but your Google Business Profile says "Smith HVAC" and your Yelp listing says "Smith Plumbing," the model may treat these as three separate entities -- each with partial weight, none with dominant authority. The confidence score fragments across name variants instead of accumulating on one entity.

The Wikidata Embedding Project (launched October 2025 by Wikimedia Deutschland, Jina.AI, and DataStax) converted all 119 million Wikidata entries into vector representations directly accessible for retrieval-augmented generation pipelines. For businesses with a Wikidata entry, AI systems can now resolve entity matches with significantly higher confidence during live inference -- not just at training time. But that resolution only works if your schema `name` field and your Wikidata entry use identical descriptions.

Generic CMS schema that auto-populates from the page title -- which often doesn't match your canonical business name -- adds inconsistency to the entity signal instead of anchoring it. From our June 5 research: "A `sameAs` link to a Wikidata Q-item, Crunchbase profile, and Product Hunt listing gives the model three separate traversal paths to the same entity record. The value is in having actual verified entries at those URLs, not in the JSON-LD property existing."

The implementation order that follows from this

Establish a canonical business name and freeze it before anything else. Inconsistent schema across sources creates more entity confusion than no schema at all -- if the name in your JSON-LD doesn't match the name in your GBP and your Yelp listing, every schema block adds a competing entity description rather than reinforcing one.

Implement `LocalBusiness` schema with every concrete field populated: name, address, telephone, URL, hours, and a `sameAs` array pointing to verified, live directory profiles that use the same canonical name.

Disable or remove generic CMS-default schema if you can't populate it correctly. An auto-populated Organization schema block with a slightly different business name than your GBP is noise in the entity signal, not an improvement. The Fischman data is explicit: generic schema at the DR<60 level shows a 31.8% citation rate -- statistically indistinguishable from having no schema advantage at all.

For businesses already appearing in AI answers regularly, schema work will not increase citation frequency. The Ahrefs finding is clear on this. For businesses starting from zero AI visibility, schema done correctly -- specific types, populated fields, consistent canonical name across every `sameAs` target -- is part of what gets you into the consideration set to begin with.

To see where your entity signal currently stands -- whether AI platforms are resolving your business accurately and which signals are creating gaps -- Signal Check at sourcepull.ca runs a live test across ChatGPT, Perplexity, Gemini, and Google AI Mode in under 60 seconds.

See how your business scores on AI platforms.

Check your score — free