• DocumentCode
    167403
  • Title

    Metrics for Evaluating Energy Saving Techniques for Resilient HPC Systems

  • Author

    Grant, Ryan E. ; Olivier, Stephen L. ; Laros, James H. ; Brightwell, Ron ; Porterfield, Allan K.

  • Author_Institution
    Sandia Nat. Labs., Albuquerque, NM, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    790
  • Lastpage
    797
  • Abstract
    The metrics used for evaluating energy saving techniques for future HPC systems are critical to the correct assessment of proposed methods. Current predictions forecast that overcoming reduced system reliability, increased power requirements and energy consumption will be a major design challenge for future systems. Modern runtime energy-saving research efforts do not take into account the energy spent providing reliability. They also do not account for the increase in the probability of failure during application execution due to runtime overhead from energy saving methods. While this is very reasonable for current systems, it is insufficient for future generation systems. By taking into account the energy consumption ramifications of increased runtimes on system reliability, better energy saving techniques can be developed. This paper demonstrates how to determine the impact of runtime energy conservation methods within the context of failure-prone large scale systems. In addition, a survey of several energy savings methodologies is conducted and an analysis is performed with respect to their effectiveness in an environment in which failures occur.
  • Keywords
    parallel processing; power aware computing; probability; application execution; energy consumption ramifications; energy saving methods; energy saving techniques; failure probability; failure-prone large scale systems; metrics; power requirements; resilient HPC systems; runtime energy conservation methods; runtime energy-saving research efforts; system reliability; Checkpointing; Energy consumption; Equations; Measurement; Reliability; Runtime; Sockets; DVFS; HPC; energy saving; frequency scaling; power; reliability; voltage scaling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.91
  • Filename
    6969462