DocumentCode :
1917571
Title :
Poster: Beating MKL and ScaLAPACK at Rectangular Matrix Multiplication Using the BFS/DFS Approach
Author :
Demmel, J. ; Eliahu, David ; Fox, A. ; Kamil, Shoaib ; Lipshitz, Benjamin ; Schwartz, Ofer ; Spillinger, Omer
fYear :
2012
fDate :
10-16 Nov. 2012
Firstpage :
1370
Lastpage :
1370
Abstract :
We implement a Communication Avoiding Recursive Matrix Multiplication algorithm (CARMA) . First communication-optimal parallel algorithm for all dimensions of matrices . The shared-memory version of CARMA is only \´-50 lines of code . Much simpler than 3D SUMMA [8], the rectangular version of 2.5D [9] . Fasterthan MKL and ScaLAPACK in practice: . Faster for skinny matrices in which k is the largest dimension: up to 7X speedup single-node, 141X speedup distributed o Faster for large square matrices: up to 1.2χ speedup single-node, 3X speedup distributed o Comparable performance for other matrix dimensions o Speedup is mainly due to reduced communication (see bar charts in Performance Results").
Keywords :
digital arithmetic; matrix multiplication; parallel algorithms; recursive functions; shared memory systems; 3D SUMMA; CARMA; MKL; ScaLAPACK; communication avoiding recursive matrix multiplication algorithm; communication optimal parallel algorithm; shared memory; square matrix;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
Type :
conf
DOI :
10.1109/SC.Companion.2012.195
Filename :
6495978
Link To Document :
بازگشت