DocumentCode :
2256276
Title :
An integrated error-recovery scheme for multicomputers
Author :
DeBrunner, Linda S. ; Ghali, Prasanna
Author_Institution :
Dept. of Electr. Eng., Oklahoma Univ., Norman, OK, USA
fYear :
1993
fDate :
1-3 Nov 1993
Firstpage :
817
Abstract :
The paper examines the problem of recovering from processing node failures in multicomputers. A general-purpose, application-transparent integrated error-recovery scheme is presented to recover the multicomputer from processing node failures in the absence of concurrent fault detection and diagnosis facilities. In the scheme, a distributed system-level fault diagnosis algorithm and error recovery algorithms cooperate to obtain a set of consistent and error-free checkpoints. The integration of fault diagnosis and error recovery algorithms permits the implementation of an effective and comprehensive fault tolerance scheme for a wide variety of distributed systems and multicomputer networks
Keywords :
computer network reliability; fault diagnosis; fault tolerant computing; multiprocessing systems; system recovery; application-transparent integrated error-recovery scheme; concurrent fault detection; distributed systems; error recovery algorithms; error-free checkpoints; fault tolerance scheme; integrated error-recovery scheme; multicomputer networks; multicomputers; processing node failures; Circuit faults; Clocks; Fault detection; Fault diagnosis; Fault tolerance; Fault tolerant systems; Routing protocols; Switching circuits; System recovery; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference on
Conference_Location :
Pacific Grove, CA
ISSN :
1058-6393
Print_ISBN :
0-8186-4120-7
Type :
conf
DOI :
10.1109/ACSSC.1993.342635
Filename :
342635
Link To Document :
بازگشت