Title :
Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond
Author :
Heinecke, Alexander ; Trinitis, Carsten
Author_Institution :
Inst. fur Inf., Tech. Univ. Munchen, Garching, Germany
Abstract :
In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI´s UltraViolet distributed shared memory machine, and Intel´s latest x86 architecture Sandy Bridge. TifaMMy´s matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel´s architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy´s highly optimized vector intrinsics version.
Keywords :
C++ language; cache storage; matrix decomposition; matrix multiplication; optimising compilers; parallel architectures; shared memory systems; C++ version; SGI UltraViolet; cache oblivious algorithm; distributed shared memory machine; matrix multiplication; optimized vector intrinsic version; parallel LU decomposition code TifaMMy; vectorization compiler switches; x86 architecture Sandy Bridge; Blades; Bridges; Computer architecture; Instruction sets; Matrix decomposition; Registers; Sockets; block-recursive; cache-oblivious; linear algebra; parallelization; performance; shared memory platforms;
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2011 International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-61284-380-3
DOI :
10.1109/HPCSim.2011.5999869