• DocumentCode
    2482837
  • Title

    Compiler-enhanced incremental checkpointing for OpenMP applications

  • Author

    Bronevetsky, Greg ; Marques, Daniel ; Pingali, Keshav ; McKee, Sally ; Rugina, Radu

  • Author_Institution
    Lawrence Livermore Nat. Lab., Livermore, CA, USA
  • fYear
    2009
  • fDate
    23-29 May 2009
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures, enabling applications to periodically save their state and restart computation after a failure. Although a many automated system-level checkpointing solutions are currently available to HPC users, manual application-level checkpointing remains more popular due to its superior performance. This paper improves performance of automated checkpointing via a compiler analysis for incremental checkpointing. This analysis, which works with both sequential and OpenMP applications, reduces checkpoint sizes by as much as 80% and enables asynchronous checkpointing.
  • Keywords
    application program interfaces; checkpointing; parallel programming; program compilers; software fault tolerance; OpenMP applications; asynchronous checkpointing; automated system-level checkpointing; compiler analysis; compiler-enhanced incremental checkpointing; fault tolerance; peta-flop performance; supercomputing systems; Application software; Checkpointing; Monitoring; National security; Optimizing compilers; Performance analysis; Read-write memory; Runtime; System software; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
  • Conference_Location
    Rome
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4244-3751-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2009.5160999
  • Filename
    5160999