• DocumentCode
    3588910
  • Title

    Flux: A Next-Generation Resource Management Framework for Large HPC Centers

  • Author

    Ahn, Dong H. ; Garlick, Jim ; Grondona, Mark ; Lipari, Don ; Springmeyer, Becky ; Schulz, Martin

  • Author_Institution
    Comput. Directorate, Lawrence Livermore Nat. Lab., Livermore, CA, USA
  • fYear
    2014
  • Firstpage
    9
  • Lastpage
    17
  • Abstract
    Resource and job management software is crucial to High Performance Computing (HPC) for efficient application execution. However, current systems and approaches can no longer keep up with the challenges large HPC centers are facing due to ever-increasing system scales, resource and workload diversity, interplays between various resources (e.g., between compute clusters and a global file system), and complexity of resource constraints such as strict power budgeting. To address this gap, we propose Flux, an extensible job and resource management framework specifically designed to deal with the requirements of next-generation HPC centers. Flux targets an entire computing facility as one common pool of diverse sets of resources, enabling the facility to accommodate site-wide constraints (e.g., for power limits). Yet, its scalable and distributed design still offers scalable and effective scheduling strategies. This paper details the design of Flux and describes and evaluates our initial prototyping effort of the key run-time components. Our results show that our run- time prototype provides strong and predictable scalability.
  • Keywords
    parallel processing; resource allocation; scheduling; Flux; HPC centers; application execution; high performance computing; job management software; next-generation resource management framework; power budgeting; resource management software; run-time components; run-time prototype; scheduling strategies; Computational modeling; Processor scheduling; Prototypes; Resource management; Scheduling; Software; Synchronization; communication framework; key value store; resource management; run-time; scalable process management services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on
  • ISSN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2014.15
  • Filename
    7103433