A Medium-Grained Algorithm for Sparse Tensor Factorization
Citations Over TimeTop 10% of 2016 papers
Abstract
Modeling multi-way data can be accomplished using tensors, which are data structures indexed along three or more dimensions. Tensors are increasingly used to analyze extremely large and sparse multi-way datasets in life sciences, engineering, and business. The canonical polyadic decomposition (CPD) is a popular tensor factorization for discovering latent features and is most commonly found via the method of alternating least squares (CPD-ALS). The computational time and memory required to compute CPD limits the size and dimensionality of the tensors that can be solved on a typical workstation, making distributed solution approaches the only viable option. Most methods for distributed-memory systems have focused on distributing the tensor in a coarse-grained, one-dimensional fashion that prohibitively requires the dense matrix factors to be fully replicated on each node. Recent work overcomes this limitation by using a fine-grained decomposition of the tensor nonzeros, at the cost of computationally expensive hypergraph partitioning. To that effect, we present a medium-grained decomposition that avoids complete factor replication and communication, while eliminating the need for expensive pre-processing steps. We use a hybrid MPI+OpenMP implementation that exploits multi-core architectures with a low memory footprint. We theoretically analyze the scalability of the coarse-, medium-, and fine-grained decompositions and experimentally compare them across a variety of datasets. Experiments show that the medium-grained decomposition reduces communication volume by 36-90% compared to the coarse-grained decomposition, is 41-76x faster than a state-of-the-art MPI code, and is 1.5-5.0x faster than the fine-grained decomposition with 1024 cores.
Related Papers
- → Sparse Matrix Multiplication On An Associative Processor(2014)35 cited
- → Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures(2020)18 cited
- → Fast Recursive Nonnegative Standard and Hierarchical Tucker Decomposition(2019)10 cited
- → SpMacho - Optimizing Sparse Linear Algebra Expressions with Probabilistic Density Estimation(2015)7 cited
- → Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators(2022)1 cited