# Minimizing Communication in Sparse Matrix Solvers

Title | Minimizing Communication in Sparse Matrix Solvers |

Publication Type | Conference Paper |

Year of Publication | 2009 |

Authors | Mohiyuddin, M., Hoemmen M., Demmel J., & Yelick K. |

Conference Name | Supercomputing '09 |

Date Published | 11/2009 |

Conference Location | Portlad, Oregon |

Abstract | Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES: here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Omega(k) vector operations like dot products, resulting in communication that grows by a factor of Omega(k) in both the memory and network. By reorganizing the sparse-matrix kernel to compute a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k log P) messages on a parallel machine, and reading matrix A from DRAM to cache just once instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to implement GMRES on an 8-core Intel Clovertown, and get speedups of up to 4.3x over standard GMRES, without sacrificing convergence rate or numerical stability. |