Title :
Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System
Author :
Naik, H. ; Gupta, R. ; Beckman, P.
Author_Institution :
Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA
Abstract :
Current petascale systems have tens of thousands of hardware components and complex system software stacks, which increase the probability of faults occurring during the lifetime of a process. Checkpointing has been a popular method of providing fault tolerance in high-end systems. While considerable research has been done to optimize checkpointing, in practice the method still involves a high-cost overhead for users. In this paper, we study the checkpointing overhead seen by applications running on leadership-class machines such as the IBM Blue Gene/P at Argonne National Laboratory. We study various applications and design a methodology to assist users in understanding and choosing checkpointing frequency and reducing the overhead incurred. In particular, we study three popular applications-the Grid-Based Projector-Augmented Wave application, the Carr-Parrinello Molecular Dynamics application, and a Nek5000 computational fluid dynamics application-and analyze their memory usage and possible checkpointing trends on 32,768 processors of the Blue Gene/P system.
Keywords :
checkpointing; computer architecture; fault tolerant computing; Carr-Parrinello molecular dynamics application; IBM Blue Gene/P System; Nek5000 computational fluid dynamics application; checkpointing trends analyzation; complex system software stack; fault tolerance computing; grid based projector augmented wave application; leadership class machine; petascale system; Application software; Checkpointing; Computational fluid dynamics; Design methodology; Fault tolerant systems; Frequency; Hardware; Laboratories; Optimization methods; System software; BG/P; Blue Gene; Checkpointing; Fault Tolerance; Full Checkpoint; Petascale;
Conference_Titel :
Parallel Processing Workshops, 2009. ICPPW '09. International Conference on
Conference_Location :
Vienna
Print_ISBN :
978-1-4244-4923-1
Electronic_ISBN :
1530-2016
DOI :
10.1109/ICPPW.2009.42