• DocumentCode
    2990697
  • Title

    Making TifaMMy fit for tomorrow: Towards future shared memory systems and beyond

  • Author

    Heinecke, Alexander ; Trinitis, Carsten

  • Author_Institution
    Inst. fur Inf., Tech. Univ. Munchen, Garching, Germany
  • fYear
    2011
  • fDate
    4-8 July 2011
  • Firstpage
    517
  • Lastpage
    524
  • Abstract
    In this paper, we present the recent port to and latest results of our cache-oblivious algorithms and implementations of parallel LU decomposition code TifaMMy on two new architectures: SGI´s UltraViolet distributed shared memory machine, and Intel´s latest x86 architecture Sandy Bridge. TifaMMy´s matrix multiplication and LU decomposition routines have been further optimized with regard to these new architectures. Results are discussed and compared with Intel´s architecture specific and optimized numerical Math Kernel Library (MKL) for both the standard C++ version with vectorization compiler switches and TifaMMy´s highly optimized vector intrinsics version.
  • Keywords
    C++ language; cache storage; matrix decomposition; matrix multiplication; optimising compilers; parallel architectures; shared memory systems; C++ version; SGI UltraViolet; cache oblivious algorithm; distributed shared memory machine; matrix multiplication; optimized vector intrinsic version; parallel LU decomposition code TifaMMy; vectorization compiler switches; x86 architecture Sandy Bridge; Blades; Bridges; Computer architecture; Instruction sets; Matrix decomposition; Registers; Sockets; block-recursive; cache-oblivious; linear algebra; parallelization; performance; shared memory platforms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Simulation (HPCS), 2011 International Conference on
  • Conference_Location
    Istanbul
  • Print_ISBN
    978-1-61284-380-3
  • Type

    conf

  • DOI
    10.1109/HPCSim.2011.5999869
  • Filename
    5999869