A Comparative Evaluation of Parallel Programming Models for Shared-Memory Architectures
Citations Over Time
Abstract
Nowadays, most computers that are commercially available off-the-shelf (COTS) include hardware features that increase the performance of parallel general-purpose threads (hyper threading, multicore, ccNUMA architectures) or SIMD kernels (CPU vector instructions, GPUs). The purpose of this paper is to perform a compared evaluation of several parallel programming models where each one is fitted to exploit some of these features but also each one requires a different level of programming skills. Four parallel programming models (OpenMP, Intel TBB, Intel ArBB, and CUDA) have been selected. The idea is to cover a wide spectrum of programming models and most of the parallel hardware features included in modern computers. On one hand, OpenMP and TBB platforms, that exploits parallel threads running on multicore systems. On the other hand, ArBB, that combines multicore parallel threads and multicore SIMD features with a simpler programming model, and CUDA that exploits SIMD features of the GPU hardware. Our results obtained with the benchmarks used on this paper suggest that OpenMP and TBB have a lower performance compared to ArBB and CUDA. But also that ArBB performance tends to be comparable with CUDA performance in most cases (although it is normally lower). Thus, there are evidences that a careful designed top range multicore and multisocket architecture, can be comparable in terms of performance with top range GPU cards for many applications, with the advantage of a simpler programming model.
Related Papers
- → Atomic Vector Operations on Chip Multiprocessors(2008)27 cited
- → Atomic Vector Operations on Chip Multiprocessors(2008)24 cited
- Comparison of Shared memory based parallel programming models(2010)
- → Data management and control-flow aspects of an SIMD/SPMD parallel language/compiler(1993)31 cited
- → Introduction(2016)