• DocumentCode
    625658
  • Title

    A Communication-Optimal N-Body Algorithm for Direct Interactions

  • Author

    Driscoll, Mike ; Georganas, Evangelos ; Koanantakool, Penporn ; Solomonik, Edgar ; Yelick, Katherine

  • Author_Institution
    Comput. Sci. Div., Univ. of California, Berkeley, Berkeley, CA, USA
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    1075
  • Lastpage
    1084
  • Abstract
    We consider the problem of communication avoidance in computing interactions between a set of particles in scenarios with and without a cutoff radius for interaction. Our strategy, which we show to be optimal in communication, divides the work in the iteration space rather than simply dividing the particles over processors, so more than one processor may be responsible for computing updates to a single particle. Similar to a force decomposition in molecular dynamics, this approach requires up to √p times more memory than a particle decomposition, but reduces communication costs by factors up to √p and is often faster in practice than a particle decomposition [1]. We examine a generalized force decomposition algorithm that tolerates the memory limited case, i.e. when memory can only hold c copies of the particles for c = 1, 2, ..., √p. When c = 1, the algorithm degenerates into a particle decomposition; similarly when c = √p, the algorithm uses a force decomposition. We present a proof that the algorithm is communication-optimal and reduces critical path latency and bandwidth costs by factors of c2 and c, respectively. Performance results from experiments on up to 24K cores of Cray XE-6 and 32K cores of IBM BlueGene/P machines indicate that the algorithm reduces communication in practice. In some cases, it even outperforms the original force decomposition approach because the right choice of c strikes a balance between the costs of collective and point-to-point communication. Finally, we extend the analysis to include a cutoff radius for direct evaluation of force interactions. We show that with a cutoff, communication optimality still holds. We sketch a generalized algorithm for multi-dimensional space and assess its performance for 1D and 2D simulations on the same systems.
  • Keywords
    N-body problems; critical path analysis; iterative methods; molecular dynamics method; parallel algorithms; physics computing; 1D simulations; 2D simulations; Cray XE-6 cores; IBM BlueGene/P machines; communication avoidance problem; communication-optimal N-body algorithm; computing interactions; computing updates; critical path latency; cutoff radius; direct force interaction evaluation; generalized force decomposition algorithm; iteration space; memory limited case; molecular dynamics; multidimensional space; particle decomposition; point-to-point communication; Algorithm design and analysis; Bandwidth; Computational modeling; Force; Heuristic algorithms; Memory management; Program processors; communication-avoiding algorithms; parallel algorithms; particle methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
  • Conference_Location
    Boston, MA
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-6066-1
  • Type

    conf

  • DOI
    10.1109/IPDPS.2013.108
  • Filename
    6569886