Communication-optimal parallel algorithm for strassen's matrix multiplication
Citations Over TimeTop 1% of 2012 papers
Abstract
Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassen-based, both asymptotically and in practice. A critical bottleneck in parallelizing Strassen's algorithm is the communication between the processors. Ballard, Demmel, Holtz, and Schwartz (SPAA '11) prove lower bounds on these communication costs, using expansion properties of the underlying computation graph. Our algorithm matches these lower bounds, and so is communication-optimal. It exhibits perfect strong scaling within the maximum possible range.
Related Papers
- → Communication costs of Strassen's matrix multiplication(2014)18 cited
- → Fast Multiplication and Sparse Structures(2004)4 cited
- → Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm(2007)1 cited
- Improved algorithms for Boolean matrix multiplication via opportunistic matrix multiplication.(2021)
- Algorithms Analysis of Matrix Multiplication(2001)