• DocumentCode
    3349897
  • Title

    A software-implemented fault injection methodology for design and validation of system fault tolerance

  • Author

    Some, Raphael R. ; Kim, Won S. ; Khanoyan, Garen ; Callum, Leslie ; Agrawal, Anil ; Beahan, John J.

  • Author_Institution
    Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
  • fYear
    2001
  • fDate
    1-4 July 2001
  • Firstpage
    501
  • Lastpage
    506
  • Abstract
    Presents our experience in developing a methodology and tool at the Jet Propulsion Laboratory (JPL) for software-implemented fault injection (SWIFI) into a parallel-processing supercomputer which is being designed for use in next-generation space exploration missions. The fault injector uses software-based strategies to emulate the effects of radiation-induced transients occurring in the system hardware components. JPL´s SWIFI tool set, which is called JIFI (JPL´s Implementation of a Fault Injector), is being used in conjunction with an appropriate system fault model to evaluate candidate hardware and software fault tolerance architectures, to determine the sensitivity of applications to faults, and to measure the effectiveness of fault detection, isolation and recovery strategies. JIFI has been validated to inject faults into user-specified CPU registers and memory regions with a uniform random distribution in location and time. Together with verifiers, classifiers and run scripts, JIFI enables massive fault injection campaigns and statistical data analysis.
  • Keywords
    aerospace computing; computer testing; fault tolerant computing; parallel processing; program testing; JIFI; Jet Propulsion Laboratory; SWIFI; classifiers; fault detection strategies; fault injection campaigns; fault isolation strategies; fault recovery strategies; fault sensitivity; fault tolerance architecture evaluation; memory regions; parallel-processing supercomputer; radiation-induced transients; run scripts; software-based strategies; software-implemented fault injection; space exploration missions; statistical data analysis; system fault model; system fault tolerance design; system fault tolerance validation; system hardware components; uniform random distribution; user-specified CPU registers; verifiers; Application software; Computer architecture; Design methodology; Fault tolerant systems; Hardware; Laboratories; Propulsion; Software tools; Space exploration; Supercomputers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks, 2001. DSN 2001. International Conference on
  • Conference_Location
    Goteborg, Sweden
  • Print_ISBN
    0-7695-1101-5
  • Type

    conf

  • DOI
    10.1109/DSN.2001.941435
  • Filename
    941435