A Scalable Tridiagonal Solver for GPUs
Citations Over TimeTop 10% of 2011 papers
Abstract
We present the design and evaluation of a scalable tridiagonal solver targeted for GPU architectures. We observed that two distinct steps are required to solve a large tridiagonal system in parallel: 1) breaking down a problem into multiple sub problems each of which is independent of other, and 2) solving the sub problems using an efficient algorithm. We propose a hybrid method of tiled parallel cyclic reduction(tiled PCR) and thread-level parallel Thomas algorithm(p-Thomas). Algorithm transition from tiled PCR to p-Thomas is determined by input system size and hardware capability in order to achieve optimal performance. The proposed method is scalable as it can cope with various input system sizes by properly adjusting algorithm trasition point. Our method on a NVidia GTX480 shows up to 8.3x and 49x speedups over multithreaded and sequential MKL implementations on a 3.33GHz Intel i7 975 in double precision, respectively.
Related Papers
- → $LU$-Decompositions of Tridiagonal Irreducible H-Matrices(1986)4 cited
- → The recursive tri-reduction method for tridiagonal linear systems(1991)1 cited
- → A comparison of sequential and parallel elimination methods for tridiagonal matrices(1989)
- A New Algorithm for Computing the Inverse of Tridiagonal and Periodic Tridiagonal Matrices(2010)
- → Appendix: Tridiagonal Matrix Algorithm(2022)