• DocumentCode
    1997277
  • Title

    Dynamic Load Balancing of the Adaptive Fast Multipole Method in Heterogeneous Systems

  • Author

    Overman, Robert E. ; Prins, Jan F. ; Miller, Laura A. ; Minion, Michael L.

  • Author_Institution
    Dept. of Comput. Sci., UNC, Chapel Hill, NC, USA
  • fYear
    2013
  • fDate
    20-24 May 2013
  • Firstpage
    1126
  • Lastpage
    1135
  • Abstract
    Simulations of colliding galaxies or fluid dynamics at immersed flexible boundaries are most accurately and efficiently accomplished using the adaptive fast multipole method (AFMM) to solve an underlying n-body problem whose localized density varies with the time-dependent evolution of the system under study. Parallelization of the AFMM presents a challenging load balancing problem that must be addressed dynamically as the system evolves. We consider parallelization of the AFMM for time dependent problems using a heterogeneous shared memory compute node consisting of multi-core processors and GPU accelerators. OpenMP task parallelism is used within the CPU cores to parallelize the construction and maintenance of the adaptive spatial decomposition tree and its traversal to compute far-field interactions at each leaf node in the tree. Concurrently, GPUs evaluate all near-field interactions using all-pairs computations. In addition to accurately resolving many physical phenomena out of reach using the uniform FMM, the more complex AFMM permits the number of bodies in leaf cells to be globally and locally varied in order to minimize the CPU and GPU time. We present a cost model and incremental adjustment strategy to load balance the AFMM on a heterogeneous system. We demonstrate using these techniques that a simulation can maintain load balance over hundreds of time steps on a heterogeneous system with 10 CPU cores and 4 GPUs with less than 2% overhead, while achieving a 98X speedup over a serial computation using a single CPU core.
  • Keywords
    application program interfaces; graphics processing units; parallel processing; physics computing; resource allocation; shared memory systems; tree data structures; AFMM parallelization; CPU cores; GPU accelerators; OpenMP task parallelism; adaptive fast multipole method; adaptive spatial decomposition tree; all-pairs computations; cost model; dynamic load balancing; graphics processing units; heterogeneous shared memory compute node; heterogeneous systems; incremental adjustment strategy; multicore processors; near-field interactions; serial computation; time-dependent problems; Adaptation models; Computational modeling; Graphics processing units; Load management; Load modeling; Octrees; Parallel processing; CUDA; OpenMP task parallelism; accelerators; adaptive fast multipole method; dynamic load balancing; hybrid computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
  • Conference_Location
    Cambridge, MA
  • Print_ISBN
    978-0-7695-4979-8
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2013.218
  • Filename
    6650998