Static placement of computation on heterogeneous devices
Citations Over TimeTop 10% of 2017 papers
Abstract
Heterogeneous architectures characterize today hardware ranging from super-computers to smartphones. However, in spite of this importance, programming such systems is still challenging. In particular, it is challenging to map computations to the different processors of a heterogeneous device. In this paper, we provide a static analysis that mitigates this problem. Our contributions are two-fold: first, we provide a semi-context-sensitive algorithm, which analyzes the program's call graph to determine the best processor for each calling context. This algorithm is parameterized by a cost model, which takes into consideration processor's characteristics and data transfer time. Second, we show how to use simulated annealing to calibrate this cost model for a given heterogeneous architecture. We have used our ideas to build Etino, a tool that annotates C programs with OpenACC or OpenMP 4.0 directives. Etino generates code for a CPU-GPU architecture without user intervention. Experiments on classic benchmarks reveal speedups of up to 75x. Moreover, our calibration process lets avoid slowdowns of up to 720x which trivial parallelization approaches would yield.
Related Papers
- → Parallel connected-component labeling algorithm for GPGPU applications(2010)14 cited
- Parallel Programming For High-Performance Computing on CUDA(2009)
- CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications(2015)
- Introductory on GPGPU Programming Technique(2010)
- → Новітні архітектури відеоадаптерів. Технологія GPGPU. Частина 2(2013)