DocumentCode
2418642
Title
Data distribution and communication schemes for IQMR method on massively distributed memory computers
Author
Yang, Laurence Tianruo
Author_Institution
St. Francis Xavire Univ., Antigonish, NS, Canada
fYear
2000
fDate
2000
Firstpage
299
Lastpage
306
Abstract
We study the parallelization of the IQMR method for the solutions of linear systems of equations with unsymmetric coefficient matrices. The IQMR method is an improved version of the quasi-minimal residual (IQMR) method by using the Lanczos process as a major component combining elements of numerical stability and parallel algorithm design. The algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent and communication time required for the inner product can be overlapped efficiently with computation time. Two important schemes are discussed. What is the best possible data distribution and which communication network topology is most suitable for the IQMR method on massively parallel distributed memory computers. A theoretical model of data distribution and communication phases is presented mainly based on (Hoekstra et al., 1991; 1992) which allows us to give a detailed execution time complexity analysis and investigates its usefulness. It is shown that the implementation of IQMR, with a row-block decomposition of the coefficient matrix, on a ring of communication structure is the most efficient choice. Performance tests of the developed parallel IQMR algorithm have been carried out on the massively distributed memory system and experimental timing results are compared with the theoretical execution time complexity analysis
Keywords
computational complexity; data communication; distributed memory systems; mathematics computing; matrix algebra; numerical stability; parallel algorithms; parallel machines; performance evaluation; IQMR method; Lanczos process; communication network topology; communication time; computation time; data communication; data distribution; execution time complexity analysis; linear systems of equations; massively parallel distributed memory computers; matrix-vector multiplications; numerical stability; parallel algorithm design; performance tests; quasi-minimal residual method; timing; unsymmetric coefficient matrices; Algorithm design and analysis; Communication networks; Computer networks; Concurrent computing; Distributed computing; Equations; Linear systems; Network topology; Numerical stability; Parallel algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 2000. Proceedings. 2000 International Workshops on
Conference_Location
Toronto, Ont.
ISSN
1530-2016
Print_ISBN
0-7695-0771-9
Type
conf
DOI
10.1109/ICPPW.2000.869116
Filename
869116
Link To Document