Cache Conscious Task Regrouping on Multicore Processors
Citations Over TimeTop 10% of 2012 papers
Abstract
Because of the interference in the shared cache on multicore processors, the performance of a program can be severely affected by its co-running programs. If job scheduling does not consider how a group of tasks utilize cache, the performance may degrade significantly, and the degradation usually varies sizably and unpredictably from run to run. In this paper, we use trace-based program locality analysis and make it efficient enough for dynamic use. We show a complete on-line system for periodically measuring the parallel execution, predicting and ranking cache interference for all co-run choices, and reorganizing programs based on the prediction. We test our system on floating-point and mixed integer and floating-point workloads composed of SPEC 2006 benchmarks and compare with the default Linux job scheduler to show the benefit of the new system in improving performance and reducing performance variation.
Related Papers
- → Transient and Steady-state Regime of a Family of List-based Cache Replacement Algorithms(2015)72 cited
- → Counter-based cache replacement algorithms(2006)52 cited
- → CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique(2011)10 cited
- → CIPARSim: Cache intersection property assisted rapid single-pass FIFO cache simulation technique(2011)4 cited
- → Measuring the potential benefits of a dynamically adaptive cache line size(2006)