Title of article :
Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications
Author/Authors :
Losada, Nuria University of A Coruna - Computer Architecture Group, Spain , Martın, Marıa J. University of A Coruna - Computer Architecture Group, Spain , Rodrıguez, Gabriel University of A Coruna - Computer Architecture Group, Spain , Gonzalez, Patricia University of A Coruna - Computer Architecture Group, Spain
Abstract :
Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. CPPC (ComPiler for Portable Checkpointing) is an application-level checkpointing tool focused on the insertion of fault tolerance into long-running MPI applications. This paper presents an extension to CPPC to allow the checkpointing of OpenMP applications.The proposed solution maintains the main characteristics of CPPC: portability and reduced checkpoint file size. The performance of the proposal is evaluated using the OpenMP NAS Parallel Benchmarks showing that most of the applications present small checkpoint overheads.
Keywords :
parallel programming , OpenMP , fault tolerance , checkpointing
Journal title :
Journal of J.UCS (Journal of Universal Computer Science)
Journal title :
Journal of J.UCS (Journal of Universal Computer Science)