DocumentCode :
3591171
Title :
A flexible scheduling framework for heterogeneous CPU-GPU clusters
Author :
Sajjapongse, Kittisak ; Agarwal, Tejaswi ; Becchi, Michela
fYear :
2014
Firstpage :
1
Lastpage :
11
Abstract :
In the last few years, thanks to their computational power and progressively increased programmability, GPUs have become part of HPC clusters. As a result, widely used open-source cluster resource managers (e.g. TORQUE and SLURM) have recently been extended with GPU support capabilities. These systems, however, treat GPUs as dedicated resources and provide scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. We propose a cluster-level scheduler and integrate it with our previously proposed node-level GPU virtualization runtime [1, 2], thus providing a hierarchical cluster resource management framework that allows the efficient use of heterogeneous CPU-GPU clusters. The scheduling policy used by our system is configurable, and our scheduler provides administrators with a high-level API that allows easily defining custom scheduling policies. We provide two application- and hardware-heterogeneity-aware cluster-level scheduling schemes for hybrid MPI-CUDA applications: co-location- and latency-reduction-based scheduling, and use them in combination with a preemption-based GPU sharing policy implemented at the node-level. We validate our framework on two heterogeneous clusters: one consisting of commodity workstations and the other of high-end nodes with various hardware configurations, and on a mix of communication- and compute-intensive applications. Our experiments show that, by better utilizing the available resources, our scheduling framework outperforms existing batch-schedulers both in terms of throughput and application latency.
Keywords :
application program interfaces; graphics processing units; message passing; parallel architectures; scheduling; cluster-level scheduler; co-location-based scheduling; flexible scheduling framework; heterogeneous CPU-GPU clusters; high-level API; hybrid MPI-CUDA applications; latency-reduction-based scheduling; open-source cluster resource managers; preemption-based GPU sharing policy; Computer architecture; Graphics processing units; Libraries; Optimal scheduling; Processor scheduling; Runtime; Torque; GPU; HPC clusters; runtime design; scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing (HiPC), 2014 21st International Conference on
Print_ISBN :
978-1-4799-5975-4
Type :
conf
DOI :
10.1109/HiPC.2014.7116892
Filename :
7116892
Link To Document :
بازگشت