• DocumentCode
    166647
  • Title

    POSTER: Energy-performance tradeoffs in multilevel checkpoint strategies

  • Author

    Bautista Gomez, Leonardo A. ; Balaprakash, Prasanna ; Bouguerra, Mohamed-Slim ; Wild, Stefan M. ; Cappello, Franck ; Hovland, Paul D.

  • Author_Institution
    Math. & Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA
  • fYear
    2014
  • fDate
    22-26 Sept. 2014
  • Firstpage
    278
  • Lastpage
    279
  • Abstract
    Increased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and study FTI, a multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar.
  • Keywords
    checkpointing; parallel machines; software fault tolerance; FTI; HPC; IBM Blue Gene/Q; energy-performance tradeoffs; fault-tolerance schemes; high-performance computing; multilevel checkpoint library; Checkpointing; Complexity theory; Encoding; Laboratories; Libraries; Power demand; Power measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2014 IEEE International Conference on
  • Conference_Location
    Madrid
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2014.6968749
  • Filename
    6968749