DocumentCode
1783377
Title
Reconstructing Householder Vectors from Tall-Skinny QR
Author
Ballard, Grey ; Demmel, J. ; Grigori, Laura ; Jacquelin, Mathias ; Hong Diep Nguyen ; Solomonik, Edgar
Author_Institution
Sandia Nat. Labs., Livermore, CA, USA
fYear
2014
fDate
19-23 May 2014
Firstpage
1159
Lastpage
1170
Abstract
The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. As a result, our final parallel QR algorithm outperforms ScaLAPACK and Elemental implementations of Householder QR and our implementation of CAQR on the Hopper Cray XE6 NERSC system. We also provide algorithmic improvements to the ScaLAPACK and CAQR algorithms.
Keywords
mathematics computing; matrix decomposition; numerical stability; parallel algorithms; software engineering; vectors; CAQR algorithm; Hopper Cray XE6 NERSC system; QR matrix decomposition; TSQR algorithm; asymptotic communication efficiency; communication-avoiding QR algorithm; householder algorithm; householder vector reconstruction; householder vector representation; latency costs; numerical stability; orthogonal factor representation; parallel QR algorithms; software development; square matrix factoring; tall-skinny QR algorithm; trailing matrix; Algorithm design and analysis; Bandwidth; Computational efficiency; Matrix decomposition; Program processors; Standards; Vectors; Householder QR; TSQR; communication-avoiding;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.120
Filename
6877344
Link To Document