• DocumentCode
    3591171
  • Title

    A flexible scheduling framework for heterogeneous CPU-GPU clusters

  • Author

    Sajjapongse, Kittisak ; Agarwal, Tejaswi ; Becchi, Michela

  • fYear
    2014
  • Firstpage
    1
  • Lastpage
    11
  • Abstract
    In the last few years, thanks to their computational power and progressively increased programmability, GPUs have become part of HPC clusters. As a result, widely used open-source cluster resource managers (e.g. TORQUE and SLURM) have recently been extended with GPU support capabilities. These systems, however, treat GPUs as dedicated resources and provide scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. We propose a cluster-level scheduler and integrate it with our previously proposed node-level GPU virtualization runtime [1, 2], thus providing a hierarchical cluster resource management framework that allows the efficient use of heterogeneous CPU-GPU clusters. The scheduling policy used by our system is configurable, and our scheduler provides administrators with a high-level API that allows easily defining custom scheduling policies. We provide two application- and hardware-heterogeneity-aware cluster-level scheduling schemes for hybrid MPI-CUDA applications: co-location- and latency-reduction-based scheduling, and use them in combination with a preemption-based GPU sharing policy implemented at the node-level. We validate our framework on two heterogeneous clusters: one consisting of commodity workstations and the other of high-end nodes with various hardware configurations, and on a mix of communication- and compute-intensive applications. Our experiments show that, by better utilizing the available resources, our scheduling framework outperforms existing batch-schedulers both in terms of throughput and application latency.
  • Keywords
    application program interfaces; graphics processing units; message passing; parallel architectures; scheduling; cluster-level scheduler; co-location-based scheduling; flexible scheduling framework; heterogeneous CPU-GPU clusters; high-level API; hybrid MPI-CUDA applications; latency-reduction-based scheduling; open-source cluster resource managers; preemption-based GPU sharing policy; Computer architecture; Graphics processing units; Libraries; Optimal scheduling; Processor scheduling; Runtime; Torque; GPU; HPC clusters; runtime design; scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing (HiPC), 2014 21st International Conference on
  • Print_ISBN
    978-1-4799-5975-4
  • Type

    conf

  • DOI
    10.1109/HiPC.2014.7116892
  • Filename
    7116892