Title :
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Author :
Lifflander, Jonathan ; Evans, G. Carl ; Arya, Anshu ; Kale, Laxmikant V.
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois, Urbana, IL, USA
Abstract :
Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions targeted at the GPU may decrease performance on the CPU. This problem is typically ameliorated by statically scheduling a fixed amount of work for agglomeration. However, determining the ideal amount of work to compose requires experimentation because it varies between architectures and problem configurations. This paper describes a novel methodology for dynamically agglomerating work units at runtime and scheduling them on accelerators. This approach is demonstrated in the context of two applications: an n-body particle simulation, which offloads particle interaction work, and a parallel dense LU solver, which relocates DGEMM kernels to the GPU. In both cases dynamic agglomeration yields comparable or better results over statically scheduling the work across a variety of system configurations.
Keywords :
grain size; graphics processing units; parallel processing; processor scheduling; CPU; DGEMM kernels; GPU; data-parallel processors; data-parallelism; dichotomy; dynamic agglomeration; dynamic scheduling; grain sizes; heterogeneous clusters; n-body particle simulation; parallel computing; parallel dense LU solver; particle interaction work; problem configurations; statically scheduling; system configurations; varying decomposition granularity; work agglomeration; Arrays; Dynamic scheduling; Grain size; Graphics processing unit; Kernel; Runtime; CUDA; GPGPU; accelerator; adaptive runtime; agglomeration; dynamic scheduling;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
DOI :
10.1109/IPDPSW.2012.297