• DocumentCode
    32922
  • Title

    A Novel Method for Scaling Iterative Solvers: Avoiding Latency Overhead of Parallel Sparse-Matrix Vector Multiplies

  • Author

    Selvitopi, R. Oguz ; Ozdal, Muhammet Mustafa ; Aykanat, Cevdet

  • Author_Institution
    Dept. of Comput. Eng., Bilkent Univ., Ankara, Turkey
  • Volume
    26
  • Issue
    3
  • fYear
    2015
  • fDate
    Mar-15
  • Firstpage
    632
  • Lastpage
    645
  • Abstract
    In parallel linear iterative solvers, sparse matrix vector multiplication (SpMxV) incurs irregular point-to-point (P2P) communications, whereas inner product computations incur regular collective communications. These P2P communications cause an additional synchronization point with relatively high message latency costs due to small message sizes. In these solvers, each SpMxV is usually followed by an inner product computation that involves the output vector of SpMxV. Here, we exploit this property to propose a novel parallelization method that avoids the latency costs and synchronization overhead of P2P communications. Our method involves a computational and a communication rearrangement scheme. The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase. The communication rearrangement realizes this opportunity by embedding P2P communications into global collective communication operations. The proposed method grants a certain value on the maximum number of messages communicated regardless of the sparsity pattern of the matrix. The downside, however, is the increased message volume and the negligible redundant computation. We favor reducing the message latency costs at the expense of increasing message volume. Yet, we propose two iterative-improvementbased heuristics to alleviate the increase in the volume through one-to-one task-to-processor mapping. Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2,048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional parallelization method.
  • Keywords
    iterative methods; mathematics computing; matrix multiplication; multiprocessing systems; parallel programming; Cray XE6 supercomputers; IBM BlueGene/Q supercomputer; P2P communications; SpMxV; communication rearrangement scheme; computational scheme; irregular point-to-point communications; iterative solver scaling; message latency cost; message size; parallel linear iterative solvers; parallel sparse-matrix vector multiplier; parallelization method; product computation; sparse matrix vector multiplication; synchronization point; Iterative methods; Parallel algorithms; Partitioning algorithms; Program processors; Sparse matrices; Synchronization; Vectors; Parallel linear iterative solvers; avoiding latency; collective communication; conjugate gradient; hiding latency; inner product computation; iterative improvement heuristic; message latency overhead; point-to-point communication; sparse matrix vector multiplication;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2311804
  • Filename
    6766662