Fast Computational GPU Design with GT-Pin
Citations Over TimeTop 17% of 2015 papers
Abstract
As computational applications become common for graphics processing units, new hardware designs must be developed to meet the unique needs of these workloads. Performance simulation is an important step in appraising how well a candidate design will serve these needs, but unfortunately, computational GPU programs are so large that simulating them in detail is prohibitively slow. This work addresses the need to understand very large computational GPU programs in three ways. First, it introduces a fast tracing tool that uses binary instrumentation for in-depth analyses of native executions on existing architectures. Second, it characterizes 25 commercial and benchmark OpenCL applications, which average 308 billion GPU instructions apiece and are by far the largest benchmarks that have been natively profiled at this level of detail. Third, it accelerates simulation of future hardware by pinpointing small subsets of OpenCL applications that can be simulated as representative surrogates in lieu of full-length programs. Our fast selection method requires no simulation itself and allows the user to navigate the accuracy/simulation speed trade-off space, from extremely accurate with reasonable speedups (35X increase in simulation speed for 0.3% error) to reasonably accurate with extreme speedups (223X simulation speedup for 3.0% error).
Related Papers
- → Performance Analysis of Benchmarks for GPU-based Linear Programming Problem Solvers(2019)3 cited
- → An Efficient Parallel Implementation of an Optimized Simplex Method in GPU-CUDA(2018)2 cited
- CPU-GPU 이기종 플랫폼에서 하둡 맵리듀스의 가속: CKY 파서 사례 분석(2014)
- About Speedup Improvement of Classical Genetic Algorithms Using Cuda Environment(2016)
- → PERFORMANCE ANALYSIS OF OPENFOAM-BASED CFD SOLVERS USING GPGPU(2021)