A parallel algorithm has perfect strong scaling if its running
time on P processors is linear in 1=P, including all commu
nication costs. Distributedmemory parallel algorithms for
matrix multiplication with perfect strong scaling have only
recently been found. One is based on classical matrix multi
plication (Solomonik and Demmel, 2011), and one is based
on Strassen's fast matrix multiplication (Ballard, Demmel,
Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale
perfectly, but only up to some number of processors where
the interprocessor communication no longer scales.
We obtain a memoryindependent communication cost
lower bound on classical and Strassenbased distributed
memory matrix multiplication algorithms. These bounds
imply that no classical or Strassenbased parallel matrix
multiplication algorithm can strongly scale perfectly beyond
the ranges already attained by the two parallel algorithms
mentioned above. The memoryindependent bounds and the
strong scaling bounds generalize to other algorithms.
