DocumentCode
592113
Title
Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA Machines
Author
Jeannot, Emmanuel
Author_Institution
LaBRI, Inria Bordeaux Sud-Ouest, Bordeaux, France
fYear
2012
fDate
17-20 Dec. 2012
Firstpage
210
Lastpage
217
Abstract
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve performance gain up to 50% compared to state-of-the-art libraries such as Plasma or MKL.
Keywords
matrix decomposition; parallel processing; performance evaluation; shared memory systems; MKL; NUMA machines; Plasma; data placement optimization; nonuniform memory access time shared memory machines; performance analysis; performance gain; state-of-the-art libraries; thread placement optimization; tiled Cholesky factorization; Instruction sets; Kernel; Message systems; Parallel processing; Resource management; Tiles; Vectors; Cholesky factorization; NUMA; thread placement;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architectures, Algorithms and Programming (PAAP), 2012 Fifth International Symposium on
Conference_Location
Taipei
ISSN
2168-3034
Print_ISBN
978-1-4673-4566-8
Type
conf
DOI
10.1109/PAAP.2012.38
Filename
6424759
Link To Document