• DocumentCode
    2996354
  • Title

    Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

  • Author

    Lifflander, Jonathan ; Evans, G. Carl ; Arya, Anshu ; Kale, Laxmikant V.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois, Urbana, IL, USA
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    2404
  • Lastpage
    2413
  • Abstract
    Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions targeted at the GPU may decrease performance on the CPU. This problem is typically ameliorated by statically scheduling a fixed amount of work for agglomeration. However, determining the ideal amount of work to compose requires experimentation because it varies between architectures and problem configurations. This paper describes a novel methodology for dynamically agglomerating work units at runtime and scheduling them on accelerators. This approach is demonstrated in the context of two applications: an n-body particle simulation, which offloads particle interaction work, and a parallel dense LU solver, which relocates DGEMM kernels to the GPU. In both cases dynamic agglomeration yields comparable or better results over statically scheduling the work across a variety of system configurations.
  • Keywords
    grain size; graphics processing units; parallel processing; processor scheduling; CPU; DGEMM kernels; GPU; data-parallel processors; data-parallelism; dichotomy; dynamic agglomeration; dynamic scheduling; grain sizes; heterogeneous clusters; n-body particle simulation; parallel computing; parallel dense LU solver; particle interaction work; problem configurations; statically scheduling; system configurations; varying decomposition granularity; work agglomeration; Arrays; Dynamic scheduling; Grain size; Graphics processing unit; Kernel; Runtime; CUDA; GPGPU; accelerator; adaptive runtime; agglomeration; dynamic scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0974-5
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2012.297
  • Filename
    6270612