• DocumentCode
    656213
  • Title

    A Dynamic Resource Management System for Network-Attached Accelerator Clusters

  • Author

    Prabhakaran, Suraj ; Iqbal, M. ; Rinke, Sebastian ; Wolf, Felix

  • Author_Institution
    German Res. Sch. for Simulation Sci., Lab. for Parallel Program., RWTH Aachen Univ., Aachen, Germany
  • fYear
    2013
  • fDate
    1-4 Oct. 2013
  • Firstpage
    773
  • Lastpage
    782
  • Abstract
    Over the years, cluster systems have become increasingly heterogeneous by equipping cluster nodes with one or more accelerators such as graphic processing units (GPU). These devices are typically attached to a compute node via PCI Express. As a consequence, batch systems such as TORQUE/Maui and SLURM have been extended to be aware of those additional resources tightly coupled with compute nodes. Recent advances in accelerator technology have given rise to the possibility of using network-attached accelerators in addition to node-attached accelerators. However, current batch systems do not support this new usage scenario of accelerators. This work focuses on the support for batch systems for allocating network-attached accelerators. The most important feature of the proposed batch system is its ability to dynamically allocate network-attached accelerators to jobs at application runtime. We discuss our extensions to the TORQUE and Maui batch system and elaborate on its features in the Dynamic Accelerator-Cluster Architecture, which describes an integration of network-attached accelerators into a cluster system. We also evaluate the dynamic allocation scenarios and show how batch systems can be designed to provide support for more flexible and dynamic cluster systems.
  • Keywords
    batch processing (computers); graphics processing units; multiprocessing systems; peripheral interfaces; GPU; PCI Express; SLURM batch system; TORQUE-Maui batch system; cluster nodes; dynamic accelerator-cluster architecture; dynamic cluster systems; dynamic resource management system; graphic processing units; network-attached accelerator cluster system; node-attached accelerators; Computer architecture; Dynamic scheduling; Graphics processing units; Method of moments; Resource management; Servers; Torque; dynamic resource management; dynamic scheduling; heterogenous architectures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2013 42nd International Conference on
  • Conference_Location
    Lyon
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2013.91
  • Filename
    6687416