مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

785440

Title :

Self-repairing processor modules

Author :

Kilmer, William L.

Author_Institution :

Massachusetts Univ., Amherst, MA, USA

Volume :

Issue :

fYear :

1995

fDate :

6/1/1995 12:00:00 AM

Firstpage :

327

Lastpage :

332

Abstract :

A processor is any self-contained computer of at least personal-computer capability. The paper explores how much the processor mean time-to-failure can be improved by replacing it with an N-processor module, where each processor in the module consists of a copy of the original processor augmented with a communication protocol unit. The copy of the original processor is faulty with probability, p_c, and the protocol unit is faulty with probability, p. The asynchronous N-processor module uses a Byzantine agreement (F-ID-P) algorithm to identify which of its processors disagreed with a module consensus. The identified processors are presumed faulty, and the module replaces them with duplicates from a set of standbys. The F-ID-P algorithm is a modification of Bracha´s, which guarantees that in a module of 3t+1 processors, up to t faults can be identified by at least t+1 non-faulty processors. The module fails if faults in more than t of its processors prevent it from: 1) obtaining a correct consensus, or 2) executing the algorithm. The F-ID-P algorithm departs from Bracha´s by using a random instead of an adversary scheduler of message delays. Simulation showed that almost always F-ID-P algorithm correctly identified all of a module´s faulty processors if more than half of them were nonfaulty. Thus F-ID-P algorithm was about 3/2 more fault tolerant than guaranteed. Also, compared to a single processor´s mean number of decisions to failure, the F-ID-P module was 841 times better when N=37, down to 5.1 times better when N=10

Keywords :

failure analysis; fault tolerant computing; probability; protocols; redundancy; reliability; Byzantine agreement algorithm; F-ID-P algorithm; asynchronous N-processor module; communication protocol unit; fault tolerance; mean time-to-failure; message delays; personal-computer; probability; random scheduler; self-contained computer; self-repairing processor modules; Broadcasting; Computer networks; Delay effects; Fault diagnosis; Fault tolerance; Military computing; Processor scheduling; Protocols; Redundancy; Scheduling algorithm;

fLanguage :

English

Journal_Title :

Reliability, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9529

Type :

jour

DOI :

10.1109/24.387390

Filename :

387390

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=785440