DocumentCode :
244773
Title :
Management of an academic HPC cluster: The UL experience
Author :
Varrette, Sebastien ; Bouvry, Pascal ; Cartiaux, Hyacinthe ; Georgatos, Fotis
Author_Institution :
Comput. Sci. & Commun. (CSC) Res.Unit, Univ. of Luxembourg, Luxembourg, Luxembourg
fYear :
2014
fDate :
21-25 July 2014
Firstpage :
959
Lastpage :
967
Abstract :
The intensive growth of processing power, data storage and transmission capabilities has revolutionized many aspects of science. These resources are essential to achieve high-quality results in many application areas. In this context, the University of Luxembourg (UL) operates since 2007 an High Performance Computing (HPC) facility and the related storage by a very small team. The aspect of bridging computing and storage is a requirement of UL service - the reasons are both legal (certain data may not move) and performance related. Nowadays, people from the three faculties and/or the two Interdisciplinary centers within the UL, are users of this facility. More specifically, key research priorities such as Systems Bio-medicine (by LCSB) and Security, Reliability & Trust (by SnT) require access to such HPC facilities in order to function in an adequate environment. The management of HPC solutions is a complex enterprise and a constant area for discussion and improvement. The UL HPC facility and the derived deployed services is a complex computing system to manage by its scale: at the moment of writing, it consists of 150 servers, 368 nodes (3880 computing cores) and 1996 TB of shared storage which are all configured, monitored and operated by only three persons using advanced IT automation solutions based on Puppet [1], FAI [2] and Capistrano [3]. This paper covers all the aspects in relation to the management of such a complex infrastructure, whether technical or administrative. Most design choices or implemented approaches have been motivated by several years of experience in addressing research needs, mainly in the HPC area but also in complementary services (typically Web-based). In this context, we tried to answer in a flexible and convenient way many technological issues. This experience report may be of interest for other research centers and universities belonging either to the public or the private sector looking for good if not best practices in cluste- architecture and management.
Keywords :
Web services; computer facilities; educational computing; educational institutions; parallel processing; storage management; workstation clusters; Capistrano; FAI; LCSB; Puppet; Security, Reliability and Trust; Systems Bio-medicine; UL HPC facility; UL experience; UL service requirement; University of Luxembourg; Web-based services; academic HPC cluster management; administrative management; advanced IT automation solutions; cluster architecture; complex computing system; complex infrastructure management; computing cores; data storage; high performance computing facility; processing power; research centers; shared storage; technical management; technological issues; transmission capabilities; Automation; Context; Educational institutions; IP networks; Security; Servers; Surface acoustic waves; Capistrano; HPC; Puppet; Xen;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2014 International Conference on
Conference_Location :
Bologna
Print_ISBN :
978-1-4799-5312-7
Type :
conf
DOI :
10.1109/HPCSim.2014.6903792
Filename :
6903792
Link To Document :
بازگشت