Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond

Author

Heinecke, Alexander ; Trinitis, Carsten

Author_Institution

Inst. fur Inf., Tech. Univ. Munchen, Garching, Germany

fYear

2011

fDate

4-8 July 2011

Firstpage

517

Lastpage

524

Abstract

In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI´s UltraViolet distributed shared memory machine, and Intel´s latest x86 architecture Sandy Bridge. TifaMMy´s matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel´s architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy´s highly optimized vector intrinsics version.

Keywords

C++ language; cache storage; matrix decomposition; matrix multiplication; optimising compilers; parallel architectures; shared memory systems; C++ version; SGI UltraViolet; cache oblivious algorithm; distributed shared memory machine; matrix multiplication; optimized vector intrinsic version; parallel LU decomposition code TifaMMy; vectorization compiler switches; x86 architecture Sandy Bridge; Blades; Bridges; Computer architecture; Instruction sets; Matrix decomposition; Registers; Sockets; block-recursive; cache-oblivious; linear algebra; parallelization; performance; shared memory platforms;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing and Simulation (HPCS), 2011 International Conference on

Conference_Location

Istanbul

Print_ISBN

978-1-61284-380-3

Type

conf

DOI

10.1109/HPCSim.2011.5999869

Filename

5999869