مرکز منطقه ای اطلاع رساني علوم و فناوري - A Communication-Avoiding, Hybrid-Parallel, Rank-Revealing Orthogonalization Method

DocumentCode :

3206319

Title :

A Communication-Avoiding, Hybrid-Parallel, Rank-Revealing Orthogonalization Method

Author :

Hoemmen, Mark

Author_Institution :

Scalable Algorithms Dept., Sandia Nat. Labs., Albuquerque, NM, USA

fYear :

2011

fDate :

16-20 May 2011

Firstpage :

966

Lastpage :

977

Abstract :

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, "communication" includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches for orthogonalizing the vectors within each block ("normalization"). TSQR\´s rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5 -- 20) of very long vectors. The shared-memory parallel component uses Intel\´s Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.

Keywords :

distributed memory systems; eigenvalues and eigenfunctions; iterative methods; mathematics computing; matrix decomposition; software packages; CPU; GPU; Gram-Schmidt; Householder QR factorization; Intel threading building blocks; TSQR; Tall Skinny QR; Trilinos software package; block iterative methods; communication-avoiding orthogonalization method; data movement; deflation detection; distributed-memory parallelism; eigenvalue problems; hybrid-parallel orthogonalization method; rank-revealing orthogonalization method; shared-memory parallelism; shared-memory programming models; sparse linear systems; Accuracy; Instruction sets; Iterative methods; Kernel; Parallel processing; Q factor; Tuning;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International

Conference_Location :

Anchorage, AK

ISSN :

1530-2075

Print_ISBN :

978-1-61284-372-8

Electronic_ISBN :

1530-2075

Type :

conf

DOI :

10.1109/IPDPS.2011.93

Filename :

6012905

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3206319