مرکز منطقه ای اطلاع رساني علوم و فناوري - Evaluation of Software-Implemented Fault-Tolerance (SIFT) Approach in Gracefully Degradable Multi-Computer Systems

DocumentCode :

1196379

Title :

Evaluation of Software-Implemented Fault-Tolerance (SIFT) Approach in Gracefully Degradable Multi-Computer Systems

Author :

Avresky, Dimiter R. ; Geoghegan, Sean J. ; Varoglu, Yavuz

Author_Institution :

Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA

Volume :

Issue :

fYear :

2006

Firstpage :

451

Lastpage :

457

Abstract :

This paper presents an analytical method for evaluating the reliability improvement for any size of multi-computer system based on Software-Implemented Fault-Tolerance (SIFT). The method is based on the equivalent failure rate Gamma, the single node failure rate lambda, the number of nodes in the system, N, the repair rate mu, the fault coverage factor c, the reconfiguration rate delta, and the percentage of blocking faults b₁ and b₂. The impact of these parameters on the reliability improvement has been evaluated for a gracefully degradable multi-computer system using our proposed analytical technique based on Markov chains. To validate our approach, we used the SIFT method which implements error detection at the node level, combined with a fast reconfiguration algorithm for avoiding faulty nodes. It is worth noting that the proposed method is applicable to any multi-computer systems´ topology. The evaluation work presented in this paper focuses on the combination of analytical and experimental approaches, and more precisely on Markov chains. The SIFT method has been successfully implemented for a multi-computer system, nCube. The time overhead (reconfiguration & recomputation time) incurred by the injected fault, and the fault coverage factor c, are experimentally evaluated by means of a parallel version of the Software Object-Oriented Fault-Injection Tool (nSOFIT). The implemented SIFT approach can be used for real-time applications, when the time constraints should be met despite failures in the gracefully degradable multi-computer system

Keywords :

Markov processes; error detection; object-oriented programming; real-time systems; software fault tolerance; system recovery; Markov chains; SIFT method; error detection; fast reconfiguration algorithm; multi-computer system; real-time application; software object-oriented fault-injection tool; software-implemented fault-tolerance; Central Processing Unit; Degradation; Fault detection; Fault tolerant systems; Object oriented modeling; Operating systems; Real time systems; Software tools; Topology; Upper bound; Fault tolerance; Markov chain; graceful degradation; mean time to failure; multi-computers; reconfiguration; reliability improvement;

fLanguage :

English

Journal_Title :

Reliability, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9529

Type :

jour

DOI :

10.1109/TR.2006.879663

Filename :

1688080

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1196379