DocumentCode :
381324
Title :
Fault-tolerant systems design-estimating cache contents and usage
Author :
Some, Raphael R. ; Beahan, John ; Khanoyan, Garen ; Callum, Leslie N. ; Agrawal, Anil
Author_Institution :
Jet Propulsion Lab., California Inst. of Technol., Pasadena, CA, USA
Volume :
5
fYear :
2002
fDate :
2002
Firstpage :
91068
Abstract :
Development of the Remote Exploration and Experimentation (REE) Commercial Off The Shelf (COTS) based space-borne supercomputer requires a detailed knowledge of system behavior in the presence of Single Even Upset (SEU) induced faults. When combined with a hardware radiation fault, model and mission environment data in a medium grained system model, experimentally obtained fault behavior data can be used to: predict system reliability, availability and performance; determine optimal fault detection methods and boundaries; and define high Return On Investment (ROI) fault tolerance strategies. The REE project has developed a fault injection suite of tools and a methodology for experimentally determining system behavior statistics in the presence of SEU induced transient faults in application level codes. Where faults cannot be directly injected, analytic means are used in conjunction with experimental data to determine probabilistic system fault response. In many processors, it is not possible to inject faults directly into onboard cache. In this case, a cache contents estimation tool can be used to define probabilistic fault susceptibility which is then combined with direct memory fault injection data to determined fault behavior statistics. In this paper we discuss the structure, function and usage of a PPC-750 cache contents estimator for the REE project.
Keywords :
aerospace computing; cache storage; fault tolerant computing; parallel machines; radiation effects; space vehicle electronics; CacheSim; PPC-750; REE COTS based space-borne supercomputer; REE project; Remote Exploration Experimentation project; SEU induced faults; cache contents estimation tool; direct memory fault injection data; fault behavior statistics; fault injection suite of tools; fault-tolerant systems design; hardware radiation fault model; high ROI fault tolerance strategies; medium grained system model; mission environment data; onboard cache; probabilistic fault susceptibility; single even upset; Computer architecture; Concurrent computing; Fault tolerance; Fault tolerant systems; Hardware; Investments; Predictive models; Space technology; Space vehicles; Weight control;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference Proceedings, 2002. IEEE
Print_ISBN :
0-7803-7231-X
Type :
conf
DOI :
10.1109/AERO.2002.1035380
Filename :
1035380
Link To Document :
بازگشت