The 11% Citation Overlap Problem: Why Cross-Engine Averaging Fails
Key findings
- 11% domain overlap between ChatGPT and Perplexity for identical queries (ziptie.dev, 2026)
- Cross-engine composite scores have no statistical validity
- Engine-specific measurement is the only defensible approach
The measurement
We replicated the ziptie.dev cross-platform citation study using 1,200 commercial-intent queries probed simultaneously on ChatGPT and Perplexity.
For each query, we recorded all domains cited in the response. We then calculated the Jaccard similarity coefficient between the ChatGPT citation set and the Perplexity citation set.
Results
The median Jaccard similarity was 0.11 (11%). This means that for a typical query, 89% of the domains cited by one engine are not cited by the other.
This finding has a direct methodological consequence: any measurement system that averages citation rates across engines produces a composite score with no statistical meaning. You cannot meaningfully average across distributions with 89% divergence.
Implications for measurement
-
Engine-specific metrics are mandatory. A brand's citation share on ChatGPT is a different measurement from its citation share on Perplexity. Combining them into one number destroys information.
-
"AI visibility scores" are methodologically indefensible. Any product offering a single cross-engine score is producing a number that cannot be reproduced, cannot be decomposed, and cannot be attributed to any specific engine behaviour.
-
Rectifies measures per-engine, always. Every metric we publish is engine-specific. Cross-engine comparisons are presented as separate measurements, never averaged.