Matrix Multiplication on Multidimensional Torus Networks

TitleMatrix Multiplication on Multidimensional Torus Networks
Publication TypeConference Paper
Year of Publication2012
AuthorsSolomonik, E., & Demmel J.
Conference Name10th International Meeting on High-Performance Computing for Computational Science (VECPAR 2012)
Conference LocationKobe, Japan
Abstract

Blocked matrix multiplication algorithms such as Cannon's algorithm and SUMMA have a 2-dimensional communication structure. We introduce a generalized 'Split-Dimensional' version of Cannon's algorithm (SD-Cannon) with higher-dimensional and bidirectional communication structure.
This algorithm is useful for higher-dimensional torus interconnects that can achieve more injection bandwidth than single-link bandwidth. On a bidirectional torus network of dimension d, SD-Cannon can lower the algorithmic bandwidth cost by a factor of up to d. With rectangular collectives, SUMMA also achieves the lower bandwidth cost but has a higher latency cost. We use Charm++ virtualization to effciently map SD-Cannon on unbalanced and odd-dimensional torus network partitions. Our performance study on Blue Gene/P demonstrates that an MPI version of SD-Cannon can exploit multiple communication links and improve performance.

AttachmentSize
Matrix multiplication on multidimensional torus networks.pdf281.87 KB