Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Citations Over TimeTop 10% of 2006 papers
Abstract
Multicore processors are about to become prevalent in the PC world. Meanwhile, over 90% of the computing cycles are estimated to be consumed by streaming media applications (Rixner et al., 1998). Although stream programming exposes parallelism naturally, we found that achieving high performance on multiprocessors is challenging. Therefore, we develop a parallel compiler for the Brook streaming language with aggressive data and computation transformations. First, we formulate fifteen Brook stream operators in terms of systems of inequalities. Our compiler optimizes the modeled operators to improve memory footprint and performance. Second, the stream computation including both kernels and operators is mapped to the affine partitioning model by modeling each kernel as an implicit loop nest over stream elements. Note that our general abstraction is not limited to Brook. Our modeling and transformations yield high performance on uniprocessors as well. The geometric mean of speedups is 4.7 on ten streaming applications on a Xeon. On multiprocessors, we show that exploiting the standard intra-kernel data parallelism is inferior to our general modeling. The former yields a speedup of 1.5 for ten applications on a 4-way Xeon, while the latter achieves a speedup of 6.4 over the same baseline. We show that our compiler effectively reduces memory footprint, exploits parallelism, and circumvents phase-ordering issues.
Related Papers
- → A machine model for dataflow actors and its applications(2011)46 cited
- → Optimization of data stream processing(2004)16 cited
- → SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism(2011)2 cited
- → Study of a simulated stream machine for dataflow computation(1986)
- → SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism(2015)