Minimizing Communication in Sparse Matrix Solvers
| Title | Minimizing Communication in Sparse Matrix Solvers |
| Publication Type | Conference Paper |
| Year of Publication | 2009 |
| Authors | Mohiyuddin, M., M. Hoemmen, J. Demmel, and K. Yelick |
| Conference Name | Supercomputing '09 |
| Date Published | 11/2009 |
| Abstract | Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES: here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Omega(k) vector operations like dot products, resulting in communication that grows by a factor of Omega(k) in both the memory and network. By reorganizing the sparse-matrix kernel to compute a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k log P) messages on a parallel machine, and reading matrix A from DRAM to cache just once instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to implement GMRES on an 8-core Intel Clovertown, and get speedups of up to 4.3x over standard GMRES, without sacrificing convergence rate or numerical stability. |
