Exploring the multiple-GPU design space
Citations Over TimeTop 10% of 2009 papers
Abstract
Graphics processing units (GPUs) have been growing in popularity due to their impressive processing capabilities, and with general purpose programming languages such as NVIDIA's CUDA interface, are becoming the platform of choice in the scientific computing community. Previous studies that used GPUs focused on obtaining significant performance gains from execution on a single GPU. These studies employed low-level, architecture-specific tuning in order to achieve sizeable benefits over multicore CPU execution. In this paper, we consider the benefits of running on multiple (parallel) GPUs to provide further orders of performance speedup. Our methodology allows developers to accurately predict execution time for GPU applications while varying the number and configuration of the GPUs, and the size of the input data set. This is a natural next step in GPU computing because it allows researchers to determine the most appropriate GPU configuration for an application without having to purchase hardware, or write the code for a multiple-GPU implementation. When used to predict performance on six scientific applications, our framework produces accurate performance estimates (11% difference on average and 40% maximum difference in a single case) for a range of short and long running scientific programs.
Related Papers
- → Parallel connected-component labeling algorithm for GPGPU applications(2010)14 cited
- Parallel Programming For High-Performance Computing on CUDA(2009)
- CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications(2015)
- Introductory on GPGPU Programming Technique(2010)
- → Новітні архітектури відеоадаптерів. Технологія GPGPU. Частина 2(2013)