Title :
Best practices for the deployment and management of production HPC clusters
Author :
McLay, Robert ; Schulz, Karl W. ; Barth, William L. ; Minyard, Tommy
Author_Institution :
Texas Adv. Comput. Center (TACC), Univ. of Texas at Austin, Austin, TX, USA
Abstract :
Commodity-based Linux HPC clusters dominate the scientific computing landscape in both academia and industry ranging from small research clusters to petascale supercomputers supporting thousands of users. To support broad user communities and manage a user-friendly environment, end-user sites must combine a range of low-level system soft ware with multiple compiler chains, support libraries, and a suite of 3rd party applications. In addition, large sys tems require bare metal provisioning and a flexible software management strategy to maintain consistency and upgrade ability across thousands of compute nodes. This report documents a Linux operating system framework, (LosF), which has evolved over the last seven years to provide an integrated strategy for the deployment of multiple HPC systems at the Texas Advanced Computing Center. Documented within this effort is the high-level cluster configuration options and definitions, bare-metal provisioning, hierarchical HPC soft ware stack design, package-management, user environment management tools, user account synchronization, and local customization configurations.
Keywords :
Linux; distributed processing; 3rd party applications; Linux operating system framework; Texas Advanced Computing Center; bare-metal provisioning; commodity-based Linux HPC clusters; hierarchical HPC software stack design; high-level cluster configuration; local customization configurations; low-level system software; multiple compiler chains; package-management; petascale supercomputers; production HPC clusters; research clusters; scientific computing landscape; software management strategy; support libraries; user account synchronization; user environment management tools; Libraries; Linux; Optimized production technology; Servers; Software; Synchronization;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
Conference_Location :
Seatle, WA
Electronic_ISBN :
978-1-4503-0771-0