DocumentCode :
3414746
Title :
A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers
Author :
Choi, Jaeyoung
Author_Institution :
Sch. of Comput., Soongsil Univ., Seoul, South Korea
fYear :
1997
fDate :
1-5 Apr 1997
Firstpage :
310
Lastpage :
314
Abstract :
The author presents a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA (distribution-independent matrix multiplication algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor when the block size is too small as well as too large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer
Keywords :
distributed memory systems; matrix multiplication; parallel algorithms; parallel machines; pipeline processing; DIMMA; Intel Paragon computer; LCM block concept; SUMMA; block size; computation/communication overlap; distributed-memory concurrent computers; distribution-independent matrix multiplication algorithm; fast scalable universal matrix multiplication algorithm; maximum performance; modified pipelined communication scheme; sequential BLAS routine; Concurrent computing; Contracts; Design optimization; Distributed computing; Grid computing; Jacobian matrices; Linear algebra; Supercomputers; Wrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Symposium, 1997. Proceedings., 11th International
Conference_Location :
Genva
ISSN :
1063-7133
Print_ISBN :
0-8186-7793-7
Type :
conf
DOI :
10.1109/IPPS.1997.580916
Filename :
580916
Link To Document :
بازگشت