Pitfalls of long-term online controlled experiments
Citations Over TimeTop 1% of 2016 papers
Abstract
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features. One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different variants. The difficulty is that short-term changes to metrics may not predict the long-term impact of a change. For example, raising prices likely increases short-term revenue but also likely reduces long-term revenue (customer lifetime value) as users abandon. Degrading search results in a Search Engine causes users to search more, thus increasing query share short-term, but increasing abandonment and thus reducing longterm customer lifetime value. Ideally, an OEC is based on metrics in a short-term experiment that are good predictors of long-term value. To assess long-term impact, one approach is to run longterm controlled experiments and assume that long-term effects are represented by observed metrics. In this paper we share several examples of long-term experiments and the pitfalls associated with running them. We discuss cookie stability, survivorship bias, selection bias, and perceived trends, and share methodologies that can be used to partially address some of these issues. While there is clearly value in evaluating long-term trends, experimenters running long-term experiments must be cautious, as results may be due to the above pitfalls more than the true delta between the Treatment and Control. We hope our real examples and analyses will sensitize readers to the issues and encourage the development of new methodologies for this important problem.
Related Papers
- → MEASUREMENTS OF WAVE IMPACTS AT FULL SCALE: RESULTS OF FIELDWORK ON CONCRETE ARMOUR UNITS(1996)9 cited
- → Serving Away From Home: How Deployments Influence Reenlistment(2002)19 cited
- → Defining and Checking Deployment Contracts for Software Components(2006)15 cited
- → Understanding deployment from the perspective of those who have served(2016)5 cited
- → Large-Scale Deployment of Tablet Computers in Brazilian Public Schools: Decisive Factors and an Implementation Model(2017)1 cited