Public Methodology

How GEO measures replica variance and stability.

Large-language-model outputs are non-deterministic even when the query and engine stay fixed. GEO therefore treats replica count as part of the measurement design, not an implementation detail.

Why replica count matters

A single replica can overstate or understate visibility because ranking, mention placement, and citation behavior fluctuate across repeated draws. GEO averages replica outcomes within a run before reporting the run-level statistic, then bootstraps across runs for intervals.

Presence volatility: variance in whether the brand appears across replicas of the same run.

Rank volatility: dispersion in the observed rank position when the brand is mentioned.

Citation share volatility: dispersion in cited-source share attributable to the brand across replicas.

Stability score: the run-aggregated presence consistency statistic exposed by MeasurementReadService.overview().

Illustrative variance compression

The chart below is illustrative rather than customer data. It shows how the average of five replicas is materially steadier than any single replica draw for the same query.

Replica 1single vs averaged

Replica 2single vs averaged

Replica 3single vs averaged

Replica 4single vs averaged

Replica 5single vs averaged

Statistical treatment

GEO bootstraps run-level aggregates with 1,000 draws using a fixed random seed for reproducibility in the public implementation. Production helpers live in packages/monitor/src/geo_monitor/services/statistics.py, while volatility metrics are assembled in packages/monitor/src/geo_monitor/services/metrics.py.