A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers

Author

Choi, Jaeyoung

Author_Institution

Sch. of Comput., Soongsil Univ., Seoul, South Korea

fYear

1997

fDate

1-5 Apr 1997

Firstpage

310

Lastpage

314

Abstract

The author presents a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA (distribution-independent matrix multiplication algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor when the block size is too small as well as too large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer

Keywords

distributed memory systems; matrix multiplication; parallel algorithms; parallel machines; pipeline processing; DIMMA; Intel Paragon computer; LCM block concept; SUMMA; block size; computation/communication overlap; distributed-memory concurrent computers; distribution-independent matrix multiplication algorithm; fast scalable universal matrix multiplication algorithm; maximum performance; modified pipelined communication scheme; sequential BLAS routine; Concurrent computing; Contracts; Design optimization; Distributed computing; Grid computing; Jacobian matrices; Linear algebra; Supercomputers; Wrapping;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel Processing Symposium, 1997. Proceedings., 11th International

Conference_Location

Genva

ISSN

1063-7133

Print_ISBN

0-8186-7793-7

Type

conf

DOI

10.1109/IPPS.1997.580916

Filename

580916