DocumentCode
2990697
Title
Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond
Author
Heinecke, Alexander ; Trinitis, Carsten
Author_Institution
Inst. fur Inf., Tech. Univ. Munchen, Garching, Germany
fYear
2011
fDate
4-8 July 2011
Firstpage
517
Lastpage
524
Abstract
In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI´s UltraViolet distributed shared memory machine, and Intel´s latest x86 architecture Sandy Bridge. TifaMMy´s matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel´s architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy´s highly optimized vector intrinsics version.
Keywords
C++ language; cache storage; matrix decomposition; matrix multiplication; optimising compilers; parallel architectures; shared memory systems; C++ version; SGI UltraViolet; cache oblivious algorithm; distributed shared memory machine; matrix multiplication; optimized vector intrinsic version; parallel LU decomposition code TifaMMy; vectorization compiler switches; x86 architecture Sandy Bridge; Blades; Bridges; Computer architecture; Instruction sets; Matrix decomposition; Registers; Sockets; block-recursive; cache-oblivious; linear algebra; parallelization; performance; shared memory platforms;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing and Simulation (HPCS), 2011 International Conference on
Conference_Location
Istanbul
Print_ISBN
978-1-61284-380-3
Type
conf
DOI
10.1109/HPCSim.2011.5999869
Filename
5999869
Link To Document