DocumentCode
1917571
Title
Poster: Beating MKL and ScaLAPACK at Rectangular Matrix Multiplication Using the BFS/DFS Approach
Author
Demmel, J. ; Eliahu, David ; Fox, A. ; Kamil, Shoaib ; Lipshitz, Benjamin ; Schwartz, Ofer ; Spillinger, Omer
fYear
2012
fDate
10-16 Nov. 2012
Firstpage
1370
Lastpage
1370
Abstract
We implement a Communication Avoiding Recursive Matrix Multiplication algorithm (CARMA) . First communication-optimal parallel algorithm for all dimensions of matrices . The shared-memory version of CARMA is only \´-50 lines of code . Much simpler than 3D SUMMA [8], the rectangular version of 2.5D [9] . Fasterthan MKL and ScaLAPACK in practice: . Faster for skinny matrices in which k is the largest dimension: up to 7X speedup single-node, 141X speedup distributed o Faster for large square matrices: up to 1.2χ speedup single-node, 3X speedup distributed o Comparable performance for other matrix dimensions o Speedup is mainly due to reduced communication (see bar charts in Performance Results").
Keywords
digital arithmetic; matrix multiplication; parallel algorithms; recursive functions; shared memory systems; 3D SUMMA; CARMA; MKL; ScaLAPACK; communication avoiding recursive matrix multiplication algorithm; communication optimal parallel algorithm; shared memory; square matrix;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location
Salt Lake City, UT
Print_ISBN
978-1-4673-6218-4
Type
conf
DOI
10.1109/SC.Companion.2012.195
Filename
6495978
Link To Document