Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine
Citations Over TimeTop 1% of 2007 papers
Abstract
The Cell Broadband Engine (BE) processor provides the potential to achieve an impressive level of performance for scientific applications. This level of performance can be reached by exploiting several dimensions of parallelism, such as thread-level parallelism using several synergistic processing elements, data streaming parallelism, vector parallelism in the form of 128-bit SIMD operations, and pipeline parallelism by issuing multiple instructions in the same clock cycle. In our exploration to achieve the optimum level of performance for Sweep3D, we have enjoyed many pleasant surprises, such as a very high floating point performance, reaching 64% of the theoretical peak in double precision, and an over all performance speedup ranging from 4.5 times when compared with "heavy iron" processors, up to over 20 times with conventional processors.
Related Papers
- → Exploiting task and data parallelism on a multicomputer(1993)60 cited
- → Relating data-parallelism and (and-) parallelism in logic programs(1996)11 cited
- Parallelism viewpoint: An architecture viewpoint to model parallelism behaviour of parallelism-intensive software systems(2010)
- → A MODEL OF SPECULATIVE PARALLELISM(1992)4 cited
- Relating data—parallelism and (and—) parallelismin logic programs(1996)