Title :
Benefits of Software Rejuvenation on HPC Systems
Author :
Naksinehaboon, Nichamon ; Taerat, Narate ; Leangsuksun, Chokchai ; Chandler, Clayton F. ; Scott, Stephen L.
Author_Institution :
Coll. of Eng. & Sci., Louisiana Tech Univ., Ruston, LA, USA
Abstract :
Rejuvenation is a technique expected to mitigate failures in HPC systems by replacing, repairing, or resetting system components. Because of the small overhead required by software rejuvenation, we primarily focus on OS/kernel rejuvenation. In this paper, we propose three rejuvenation scheduling techniques. Moreover, we investigate the claim that software rejuvenation prolongs failure times in HPC systems. Also, we compare the lost computing times of the checkpoint/restart mechanism with and without rejuvenation after each checkpoint.
Keywords :
checkpointing; object-oriented programming; operating system kernels; HPC system; OS-kernel rejuvenation; checkpoint-restart mechanism; failure mitigation; rejuvenation scheduling; software rejuvenation; system component repair; system component replacement; system component resetting; Availability; Hardware; Kernel; Numerical models; Software reliability;
Conference_Titel :
Parallel and Distributed Processing with Applications (ISPA), 2010 International Symposium on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-8095-1
Electronic_ISBN :
978-0-7695-4190-7
DOI :
10.1109/ISPA.2010.82