DocumentCode
2978692
Title
A three-parameter fast Givens QR algorithm for superscalar processors
Author
Carrig, James J., Jr. ; Meyer, Gerard G L
Author_Institution
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD, USA
Volume
2
fYear
1996
fDate
12-16 Aug 1996
Firstpage
11
Abstract
We present a three parameter fast Givens QR algorithm that exploits parallelism to improve performance on superscalar processors. We provide a selection of parameter values for which the new algorithm reduces to the standard algorithm, but show that non-standard values minimize the number of cache misses, memory references and pipeline stalls. Using a tractable model of a superscalar machine architecture, we derive rules for estimating the optimal combination of parameter values. Applying these rules, we observe a speedup over the standard algorithm of 2.4 on the Intel Pentium Pro system, 2.0 on a single thin POWER2 processor of the IBM SP2, 1.6 on a single wide POWER2 processor of the IBM SP2, and 4.2 on a single R8000 processor of the SGI POWER Challenge XL
Keywords
Kalman filters; eigenvalues and eigenfunctions; least squares approximations; parallel processing; performance evaluation; signal processing; IBM SP2; Intel Pentium Pro system; POWER2 processor; SGI POWER Challenge XL; cache misses; fast Givens QR algorithm; memory references; parallelism; parameter values; performance improvement; pipeline stalls; single R8000 processor; superscalar machine architecture; superscalar processors; tractable model; Algorithm design and analysis; High performance computing; Laboratories; Least squares methods; Libraries; Matrix decomposition; Parallel processing; Pipelines; Power system modeling; Signal processing algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 1996. Vol.3. Software., Proceedings of the 1996 International Conference on
Conference_Location
Ithaca, NY
ISSN
0190-3918
Print_ISBN
0-8186-7623-X
Type
conf
DOI
10.1109/ICPP.1996.537375
Filename
537375
Link To Document