Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Lower Bounds

TitleStrong Scaling of Matrix Multiplication Algorithms and Memory-Independent Lower Bounds
Publication TypeConference Paper
Year of Publication2012
AuthorsBallard, G., Demmel J., Holtz O., Lipshitz B., & Schwartz O.
Conference Name24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012)
Conference LocationPittsburgh, PA

A parallel algorithm has perfect strong scaling if its running
time on P processors is linear in 1=P, including all commu-
nication costs. Distributed-memory parallel algorithms for
matrix multiplication with perfect strong scaling have only
recently been found. One is based on classical matrix multi-
plication (Solomonik and Demmel, 2011), and one is based
on Strassen's fast matrix multiplication (Ballard, Demmel,
Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale
perfectly, but only up to some number of processors where
the inter-processor communication no longer scales.
We obtain a memory-independent communication cost
lower bound on classical and Strassen-based distributed-
memory matrix multiplication algorithms. These bounds
imply that no classical or Strassen-based parallel matrix
multiplication algorithm can strongly scale perfectly beyond
the ranges already attained by the two parallel algorithms
mentioned above. The memory-independent bounds and the
strong scaling bounds generalize to other algorithms.

Strong Scaling of Matrix Multiplication Algorithms.pdf287.75 KB