Title :
A repetitive fault tolerance model for parallel programs
Author :
Yen, I-Ling ; Leiss, Ernst L. ; Bastani, Farokh B.
Author_Institution :
Dept. of Comput. Sci., Houston, Univ., TX, USA
Abstract :
The authors propose a repetitive fault tolerance (RFT) model, which provides an environment for the systematic development of fault tolerant parallel programs. RFT programs can tolerate processor failures without sacrificing performance. The system gives an optimal performance when all the processors are working while continuing to work, though at a lower performance, when failure occurs. Also, the system works as long as there is at least one working processor. Thus, it not only provides a software solution to achieve a highly reliable parallel computation environment but also provides an elegant solution for constructing reliable nonrepairable systems. The model is applied to three examples to illustrate the construction procedure and to evaluate the performance of repetitive fault tolerant programs as well as to demonstrate the applicability of this model
Keywords :
fault tolerant computing; parallel programming; performance evaluation; programming environments; nonrepairable systems; optimal performance; parallel computation environment; parallel programs; processor failures; repetitive fault tolerance model; Application software; Checkpointing; Computer science; Degradation; Fault tolerance; Fault tolerant systems; Hardware; Redundancy; Space exploration; Very large scale integration;
Conference_Titel :
System Sciences, 1993, Proceeding of the Twenty-Sixth Hawaii International Conference on
Conference_Location :
Wailea, HI
Print_ISBN :
0-8186-3230-5
DOI :
10.1109/HICSS.1993.284081