Communication-Avoiding Parallel Strassen: Implementation and Performance

TitleCommunication-Avoiding Parallel Strassen: Implementation and Performance
Publication TypeConference Paper
Year of Publication2012
AuthorsLipshitz, B., Ballard G., Demmel J., & Schwartz O.
Conference NameSupercomputing 2012
Date Published11/2012
Conference LocationSalt Lake City, UT

Matrix multiplication is a fundamental kernel of
many high performance and scientific computing applications.
Most parallel implementations use classical O(n3) matrix multiplication,
even though there exist algorithms with lower arithmetic
complexity. We recently obtained a new Communication-
Avoiding Parallel Strassen algorithm (CAPS), based on Strassen’s
fast matrix multiplication, that minimizes communication (SPAA
’12). It communicates asymptotically less than all classical and
all previous Strassen-based algorithms, and it attains theoretical
lower bounds.

In this paper we show that CAPS is also faster in practice. We
benchmark and compare its performance to previous algorithms
on Hopper (Cray XE6), Intrepid (IBM BG/P), and Franklin
(Cray XT4). We demonstrate significant speedups over previous
algorithms both for large matrices and for small matrices on large
numbers of processors.We model and analyze the performance of
CAPS and predict its performance on future exascale platforms.

Communication-Avoiding Parallel Strassen Implementation and Performance.pdf455.09 KB