مرکز منطقه ای اطلاع رساني علوم و فناوري - Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

DocumentCode :

2996354

Title :

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Author :

Lifflander, Jonathan ; Evans, G. Carl ; Arya, Anshu ; Kale, Laxmikant V.

Author_Institution :

Dept. of Comput. Sci., Univ. of Illinois, Urbana, IL, USA

fYear :

2012

fDate :

21-25 May 2012

Firstpage :

2404

Lastpage :

2413

Abstract :

Dynamic scheduling and varying decomposition granularity are well-known techniques for achieving high performance in parallel computing. Heterogeneous clusters with highly data-parallel processors, such as GPUs, present unique problems for the application of these techniques. These systems reveal a dichotomy between grain sizes: decompositions ideal for the CPUs may yield insufficient data-parallelism for accelerators, and decompositions targeted at the GPU may decrease performance on the CPU. This problem is typically ameliorated by statically scheduling a fixed amount of work for agglomeration. However, determining the ideal amount of work to compose requires experimentation because it varies between architectures and problem configurations. This paper describes a novel methodology for dynamically agglomerating work units at runtime and scheduling them on accelerators. This approach is demonstrated in the context of two applications: an n-body particle simulation, which offloads particle interaction work, and a parallel dense LU solver, which relocates DGEMM kernels to the GPU. In both cases dynamic agglomeration yields comparable or better results over statically scheduling the work across a variety of system configurations.

Keywords :

grain size; graphics processing units; parallel processing; processor scheduling; CPU; DGEMM kernels; GPU; data-parallel processors; data-parallelism; dichotomy; dynamic agglomeration; dynamic scheduling; grain sizes; heterogeneous clusters; n-body particle simulation; parallel computing; parallel dense LU solver; particle interaction work; problem configurations; statically scheduling; system configurations; varying decomposition granularity; work agglomeration; Arrays; Dynamic scheduling; Grain size; Graphics processing unit; Kernel; Runtime; CUDA; GPGPU; accelerator; adaptive runtime; agglomeration; dynamic scheduling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-0974-5

Type :

conf

DOI :

10.1109/IPDPSW.2012.297

Filename :

6270612

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2996354