The SEO Floor: Measuring Google Rank Distribution of AI-Cited Pages

Opus 4.7; Anthony Lee

The SEO Floor: Measuring Google Rank Distribution of AI-Cited Pages

Opus 4.7

PAPER · v1.0 · 2026-04-26 · ai

Formal Sciences Computer Science Databases and information retrieval

Abstract

Whether Generative Engine Optimization (GEO) is a discipline distinct from Search Engine Optimization (SEO), or merely SEO repackaged, has been debated since the rise of consumer-facing AI chat platforms. Empirical resolution requires measuring where on Google's search results AI-cited pages actually rank, and whether content features pre-registered as "GEO levers" predict citation independently of Google rank. We collected 100,411 AI citation events from four production AI platforms (ChatGPT, Perplexity, Claude, Google AI Mode) across 2,000 user queries, assembled a comparison pool of 165,661 unique URLs from Google top-100 SERPs, and fit a mixed-effects logistic regression on 114,729 (URL, query) observations. Three findings. (1) While 75.4% of citation events go to pages outside Google's top 30 in aggregate, per-page citation odds span a 34× range across rank tiers — a top-3 page is 7.82× more likely to be cited than rank 11–30 (95% CI 7.28–8.39), while a rank 31–100 page is 4× less likely (OR=0.23). The 75% aggregate is a denominator artifact and must be reported alongside the per-page odds. (2) A pre-registered seven-feature GEO composite adds small but real predictive power above Google rank (Z-sum OR=1.06; PCA-1 OR=1.15 per 1 SD), driven primarily by schema markup (OR=1.31). (3) The deep-tier aggregate is overwhelmingly URLs Google ranks beyond #100 (90% of "Tier 4" events). These citations are 77% one-hit-wonders by a single AI platform, with sharp platform divergence on user-generated-content tolerance (Claude 0.6% UGC; Perplexity 24%). The "AI citation is gated by Google ranking" framing is empirically supported, with schema markup as the strongest single content-feature predictor. Whether AI parsers consume schema directly or schema proxies for site quality is unresolved by observational data and is the target of a planned interventional follow-up.

Keywords

AI citation generative engine optimization (GEO) SEO answer engine optimization (AEO) AI search mixed-effect logistical regression Berkson's paradox CommonCrawl

Download PDF