• DocumentCode
    560134
  • Title

    Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels

  • Author

    Haidar, Azzam ; Ltaief, Hatem ; Dongarra, Jack

  • Author_Institution
    Univ. of Tennessee, Knoxville, TN, USA
  • fYear
    2011
  • fDate
    12-18 Nov. 2011
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    This paper introduces a novel implementation in reducing a symmetric dense matrix to tridiagonal form, which is the preprocessing step toward solving symmetric eigenvalue problems. Based on tile algorithms, the reduction follows a two-stage approach, where the tile matrix is first reduced to symmetric band form prior to the final condensed structure. The challenging trade-off between algorithmic performance and task granularity has been tackled through a grouping technique, which consists of aggregating fine-grained and memory-aware computational tasks during both stages, while sustaining the application´s overall high performance. A dynamic runtime environment system then schedules the different tasks in an out-of-order fashion. The performance for the tridiagonal reduction reported in this paper is unprecedented. Our implementation results in up to 50-fold and 12-fold improvement (130 Gflop/s) compared to the equivalent routines from LAPACK V3.2 and Intel MKL V10.3, respectively, on an eight socket hexa-core AMD Opteron multicore shared-memory system with a matrix size of 24000 × 24000.
  • Keywords
    eigenvalues and eigenfunctions; matrix algebra; parallel algorithms; shared memory systems; Intel MKL V10.3; LAPACK V3.2; aggregated fine-grained kernels; algorithmic performance; condensed forms; dynamic runtime environment system; eight socket hexa-core AMD Opteron multicore shared-memory system; final condensed structure; fine-grained computational task; grouping technique; memory-aware computational task; memory-aware kernels; out-of-order fashion; parallel reduction; symmetric band form; symmetric dense matrix; symmetric eigenvalue problems; task granularity; tile algorithms; tile matrix; Cache memory; Kernel; Layout; Parallel processing; Runtime; Symmetric matrices; Tiles;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
  • Conference_Location
    Seatle, WA
  • Electronic_ISBN
    978-1-4503-0771-0
  • Type

    conf

  • Filename
    6114396