Minimizing Communication in Sparse Matrix Solvers
|Title||Minimizing Communication in Sparse Matrix Solvers|
|Publication Type||Conference Paper|
|Year of Publication||2009|
|Authors||Mohiyuddin, M., Hoemmen M., Demmel J., & Yelick K.|
|Conference Name||Supercomputing '09|
|Conference Location||Portlad, Oregon|
Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES: here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Omega(k) vector operations like dot products, resulting in communication that grows by a factor of Omega(k) in both the memory and network. By reorganizing the sparse-matrix kernel to compute a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k log P) messages on a parallel machine, and reading matrix A from DRAM to cache just once instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to implement GMRES on an 8-core Intel Clovertown, and get speedups of up to 4.3x over standard GMRES, without sacrificing convergence rate or numerical stability.