Taming hardware event samples for FDO compilation
Citations Over TimeTop 10% of 2010 papers
Abstract
Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling to generate estimated edge profiles overcomes these drawbacks. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based [email protected] In this paper, we use multiple hardware event profiles and supervised learning techniques to generate heuristics for improved precision of basic-block-level sample profiles, and to further improve the smoothing algorithms used to construct edge profiles. We demonstrate that sampling-based FDO can achieve an average of 78% of the performance gains obtained using instrumentation-based exact edge profiles for SPEC2000 benchmarks, matching or beating instrumentation-based FDO in many cases. The overhead of collection is only 0.74% on average, while compiler based instrumentation incurs 6.8%-53.5% overhead (and 10x overhead on an industrial web search application), and dynamic instrumentation incurs 28.6%-1639.2% overhead.
Related Papers
- → Automatic Tuning of Compiler Optimizations and Analysis of their Impact(2013)23 cited
- Automating the construction of compiler heuristics using machine learning(2006)
- → Advanced loop optimizations for parallel computers(1988)24 cited
- → Specifications can make programs run faster(1993)4 cited
- Iterative Compilation Method Based on Incremental Instance Learning(2012)