• DocumentCode
    187277
  • Title

    Fault Injection Experiments with the CLAMR Hydrodynamics Mini-App

  • Author

    Atkinson, Brian ; DeBardeleben, Nathan ; Qiang Guan ; Robey, Robert ; Jones, William M.

  • Author_Institution
    Los Alamos Nat. Lab., Ultrascale Syst. Res. Center, Los Alamos, NM, USA
  • fYear
    2014
  • fDate
    3-6 Nov. 2014
  • Firstpage
    6
  • Lastpage
    9
  • Abstract
    In this paper, we present a resilience analysis of the impact of soft errors on CLAMR, a hydrodynamics mini-app for high performance computing (HPC). We utilize F-SEFI, a fine grainedfault injection tool, to inject faults into the kernel routines of CLAMR. We demonstrate visually the impact of these faults as they are either benign (have no impact on the results), cause silent data corruption (SDC), or cause the application to crash due to instabilities. We quantify the probability that an injected fault will cause CLAMR to transition to one of the above three states using F-SEFI. Finally, we explore the relationship between the application´s fault characteristics and when the fault is injected in simulation time. Overall, we find that 17% and 24% of the faults propagate into SDC and crashes respectively.
  • Keywords
    fault tolerant computing; hydrodynamics; parallel processing; program diagnostics; CLAMR hydrodynamics mini-app; F-SEFI tool; HPC; SDC; cell-based adaptive mesh refinement; fault injection experiments; fine grained fault injection tool; high performance computing; resilience analysis; silent data corruption; Circuit faults; Computer crashes; Fault tolerance; Fault tolerant systems; Kernel; Laboratories; Resilience; fault injection; fault-tolerance; hydrodynamics; mini-app; resilience;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on
  • Conference_Location
    Naples
  • Type

    conf

  • DOI
    10.1109/ISSREW.2014.51
  • Filename
    6983788