Title :
Empirical Autotuning of Two-level Parallel Linear Algebra Routines on Large cc-NUMA Systems
Author :
C´mara, J. ; Cuenca, Javier ; Giménez, Domingo ; Vidal, Antonio M.
Author_Institution :
Dept. of Informatic & Syst., Univ. of Murcia, Murcia, Spain
Abstract :
In large cc-NUMA systems the efficient use of the different levels of the memory hierarchy is not an easy task, and the performance of multithreading implementations of the libraries decreases when the number of cores used increases, so producing an important lost of efficiency. To alleviate this problem, routines with multilevel parallelism can be developed by combining OpenMP and BLAS parallelism. In that way, higher performance can be achieved, but it is necessary to develop some autotuning technique for the appropriate selection of the number of threads to use at each level. The selection can be made through theoretical models of the execution time or some installation methodology. This work analyses some installation techniques for a two-level matrix multiplication routine, with the aim of developing a valid methodology for other linear algebra routines in large cc-NUMA systems. The basic ideas of the two-level parallelisation and the installation methodology are discussed and some experimental results are commented on.
Keywords :
linear algebra; microprocessor chips; multi-threading; multiprocessing systems; BLAS parallelism; OpenMP parallelism; empirical autotuning; large cc-NUMA systems; memory hierarchy; multithreading implementations; non uniform memory access; two level parallel linear algebra routines; Educational institutions; Instruction sets; Libraries; Linear algebra; Multithreading; Prototypes; MKL; OpenMP; autotuning; cc-NUMA; linear algebra; multithreading;
Conference_Titel :
Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on
Conference_Location :
Leganes
Print_ISBN :
978-1-4673-1631-6
DOI :
10.1109/ISPA.2012.127