Synchronization-Free Automatic Parallelization for Arbitrarily Nested Affine Loops
Abstract
This paper presents a new approach for extracting synchronization-free parallelism available in program loop nests. The approach allows for extracting parallelism for arbitrarily nested parametric loop nests, where the loop bounds and data accesses are affine functions of loop indices and symbolic parameters. Parallelization is realized using the transitive closure of a dependence graph. Speed-up of parallel code produced by means of the approach is studied using the NAS benchmark suite. Parallelism of loop nests is obtained by creating a kernel of computations represented in the OpenMP standard to be executed independently on multi-core computers. Results of an experimental study carried out by means of the many integrated core architecture Intel Xeon Phi is discussed.
Related Papers
- → Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion(2004)17 cited
- → Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures(2015)2 cited
- Nested Loop Optimization on the TMS320C6x(1999)
- → Loop-synthesizing transformation for maintaining parallelism and enhancing locality(2004)
- → Exact and efficient advanced loop interchange(1993)