Minimizing Communication in Sparse Matrix Solvers

TitleMinimizing Communication in Sparse Matrix Solvers
Publication TypeConference Paper
Year of Publication2009
AuthorsMohiyuddin, M., Hoemmen M., Demmel J., & Yelick K.
Conference NameSupercomputing '09
Date Published11/2009
Conference LocationPortlad, Oregon

Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES: here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Omega(k) vector operations like dot products, resulting in communication that grows by a factor of Omega(k) in both the memory and network. By reorganizing the sparse-matrix kernel to compute a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k log P) messages on a parallel machine, and reading matrix A from DRAM to cache just once instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to implement GMRES on an 8-core Intel Clovertown, and get speedups of up to 4.3x over standard GMRES, without sacrificing convergence rate or numerical stability.