DocumentCode :
2439310
Title :
Adapting communication-avoiding LU and QR factorizations to multicore architectures
Author :
Donfack, Simplice ; Grigori, Laura ; Gupta, Alok Kumar
Author_Institution :
INRIA Saclay-Ile de France, Univ. Paris-Sud 11, Orsay, France
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
10
Abstract :
In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication from communication avoiding algorithms with asynchronism and dynamic task scheduling. For matrices that are tall and skinny, that is, they have many more rows than columns, the two algorithms outperform the corresponding algorithms from Intel MKL vendor library on a dual-socket, quad-core machine based on Intel Xeon EMT64 processor and on a four-socket, quad-core machine based on AMD Opteron processor. For these matrices, multithreaded CALU outperforms the corresponding routine dgetrf from Intel MKL library up to a factor of 2.3 and the corresponding routine dgetrf from ACML library up to a factor of 5, while multithreaded CAQR outperforms by a factor of 5.3 the corresponding dgeqrf routine from MKL library.
Keywords :
matrix algebra; memory architecture; multi-threading; multiprocessing programs; AMD Opteron processor; Intel MKL; Intel Xeon EMT64 processor; communication optimal algorithms; communication-avoiding LU factorizations; communication-avoiding QR factorizations; dense matrices; distributed memory architectures; multicore architectures; quad-core machine; Binary trees; Dynamic scheduling; Iterative algorithms; Libraries; Matrix decomposition; Memory architecture; Multicore processing; Processor scheduling; Scheduling algorithm; Yarn; LU and QR factorizations; communication avoiding algorithms; multicore architectures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
ISSN :
1530-2075
Print_ISBN :
978-1-4244-6442-5
Type :
conf
DOI :
10.1109/IPDPS.2010.5470348
Filename :
5470348
Link To Document :
بازگشت