Previous work has shown that a lower bound on the num ber of words moved between large, slow memory and small, fast memory of size M by any conventional (nonStrassen like) direct linear algebra algorithm (matrix multiply, the LU, Cholesky, QR factorizations, . . . ) is Ω(#f lops/p(M )). This holds for dense or sparse matrices. There are analogous lower bounds for the number of messages, and for parallel algorithms instead of sequential algorithms.
Our goal here is to find algorithms that attain these lower bounds on interesting classes of sparse matrices. We focus on matrices for which there is a lower bound on the number of flops of their Cholesky factorization. Our Cholesky lower bounds on communication hold for any possible ordering of the rows and columns of the matrix, and so are globally op timal in this sense. For matrices arising from discretization on two dimensional and three dimensional regular grids, we discuss sequential and parallel algorithms that are optimal in terms of communication. The algorithms turn out to require combining previously known sparse and dense Cholesky al gorithms in simple ways.
