Title :
Performance Analysis and Optimization of the Tiled Cholesky Factorization on NUMA Machines
Author :
Jeannot, Emmanuel
Author_Institution :
LaBRI, Inria Bordeaux Sud-Ouest, Bordeaux, France
Abstract :
We discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread placement and data placement in order to achieve performance gain up to 50% compared to state-of-the-art libraries such as Plasma or MKL.
Keywords :
matrix decomposition; parallel processing; performance evaluation; shared memory systems; MKL; NUMA machines; Plasma; data placement optimization; nonuniform memory access time shared memory machines; performance analysis; performance gain; state-of-the-art libraries; thread placement optimization; tiled Cholesky factorization; Instruction sets; Kernel; Message systems; Parallel processing; Resource management; Tiles; Vectors; Cholesky factorization; NUMA; thread placement;
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2012 Fifth International Symposium on
Conference_Location :
Taipei
Print_ISBN :
978-1-4673-4566-8
DOI :
10.1109/PAAP.2012.38