Hypothesis Formalization: Empirical Findings, Software Limitations, and Design Implications
Citations Over TimeTop 16% of 2022 papers
Abstract
Data analysis requires translating higher level questions and hypotheses into computable statistical models. We present a mixed-methods study aimed at identifying the steps, considerations, and challenges involved in operationalizing hypotheses into statistical models, a process we refer to as hypothesis formalization . In a formative content analysis of 50 research papers, we find that researchers highlight decomposing a hypothesis into sub-hypotheses, selecting proxy variables, and formulating statistical models based on data collection design as key steps. In a lab study, we find that analysts fixated on implementation and shaped their analyses to fit familiar approaches, even if sub-optimal. In an analysis of software tools, we find that tools provide inconsistent, low-level abstractions that may limit the statistical models analysts use to formalize hypotheses. Based on these observations, we characterize hypothesis formalization as a dual-search process balancing conceptual and statistical considerations constrained by data and computation and discuss implications for future tools.
Related Papers
- → A Deeper Look at the “Neural Correlate of Consciousness”(2016)47 cited
- → Design for Values and the Definition, Specification, and Operationalization of Values(2015)11 cited
- → Design for Values and the Definition, Specification, and Operationalization of Values(2014)9 cited
- → Designing an Empirical Analysis(2016)
- → Operationalization of the Research Model(2008)