Title :
A dynamic programming approach to optimizing the blocking strategy for the Householder QR decomposition
Author :
Fukaya, Takeshi ; Yamamoto, Yusaku ; Zhang, Shao-liang
Author_Institution :
Grad. Sch. of Eng., Nagoya Univ., Nagoya
fDate :
Sept. 29 2008-Oct. 1 2008
Abstract :
In this paper, we present a new approach to optimizing the blocking strategy for the householder QR decomposition. In high performance implementations of the householder QR algorithm, it is common to use a blocking technique for the efficient use of the cache memory. There are several well known blocking strategies like the fixed-size blocking and recursive blocking, and usually their parameters such as the block size and the recursion level are tuned according to the target machine and the problem size. However, strategies generated with this kind of parameter optimization constitute only a small fraction of all possible blocking strategies. Given the complex performance characteristics of modern microprocessors, non-standard strategies may prove effective on some machines. Considering this situation, we first propose a new universal model that can express a far larger class of blocking strategies than has been considered so far. Next, we give an algorithm to find a near-optimal strategy from this class using dynamic programming. As a result of this approach, we found an effective blocking strategy that has never been reported. Performance evaluation on the Opteron and Core2 processors show that our strategy achieves about 1.2 times speedup over recursive blocking when computing the QR decomposition of a 6000 times 6000 matrix.
Keywords :
cache storage; dynamic programming; matrix decomposition; microprocessor chips; Core2 processor; Opteron processor; blocking strategy; cache memory; dynamic programming; householder QR decomposition; microprocessor; parameter optimization; Aggregates; Binary trees; Cache memory; Dynamic programming; Heuristic algorithms; High performance computing; Least squares methods; Matrix decomposition; Microprocessors; Singular value decomposition;
Conference_Titel :
Cluster Computing, 2008 IEEE International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4244-2639-3
Electronic_ISBN :
1552-5244
DOI :
10.1109/CLUSTR.2008.4663801