DocumentCode :
2990697
Title :
Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond
Author :
Heinecke, Alexander ; Trinitis, Carsten
Author_Institution :
Inst. fur Inf., Tech. Univ. Munchen, Garching, Germany
fYear :
2011
fDate :
4-8 July 2011
Firstpage :
517
Lastpage :
524
Abstract :
In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI´s UltraViolet distributed shared memory machine, and Intel´s latest x86 architecture Sandy Bridge. TifaMMy´s matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel´s architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy´s highly optimized vector intrinsics version.
Keywords :
C++ language; cache storage; matrix decomposition; matrix multiplication; optimising compilers; parallel architectures; shared memory systems; C++ version; SGI UltraViolet; cache oblivious algorithm; distributed shared memory machine; matrix multiplication; optimized vector intrinsic version; parallel LU decomposition code TifaMMy; vectorization compiler switches; x86 architecture Sandy Bridge; Blades; Bridges; Computer architecture; Instruction sets; Matrix decomposition; Registers; Sockets; block-recursive; cache-oblivious; linear algebra; parallelization; performance; shared memory platforms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2011 International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-61284-380-3
Type :
conf
DOI :
10.1109/HPCSim.2011.5999869
Filename :
5999869
Link To Document :
بازگشت