Public Methodology
Large-language-model outputs are non-deterministic even when the query and engine stay fixed. GEO therefore treats replica count as part of the measurement design, not an implementation detail.
A single replica can overstate or understate visibility because ranking, mention placement, and citation behavior fluctuate across repeated draws. GEO averages replica outcomes within a run before reporting the run-level statistic, then bootstraps across runs for intervals.
Presence volatility: variance in whether the brand appears across replicas of the same run.
Rank volatility: dispersion in the observed rank position when the brand is mentioned.
Citation share volatility: dispersion in cited-source share attributable to the brand across replicas.
Stability score: the run-aggregated presence consistency statistic exposed by MeasurementReadService.overview().
The chart below is illustrative rather than customer data. It shows how the average of five replicas is materially steadier than any single replica draw for the same query.
GEO bootstraps run-level aggregates with 1,000 draws using a fixed random seed for reproducibility in the public implementation. Production helpers live in packages/monitor/src/geo_monitor/services/statistics.py, while volatility metrics are assembled in packages/monitor/src/geo_monitor/services/metrics.py.