DocumentCode
2256276
Title
An integrated error-recovery scheme for multicomputers
Author
DeBrunner, Linda S. ; Ghali, Prasanna
Author_Institution
Dept. of Electr. Eng., Oklahoma Univ., Norman, OK, USA
fYear
1993
fDate
1-3 Nov 1993
Firstpage
817
Abstract
The paper examines the problem of recovering from processing node failures in multicomputers. A general-purpose, application-transparent integrated error-recovery scheme is presented to recover the multicomputer from processing node failures in the absence of concurrent fault detection and diagnosis facilities. In the scheme, a distributed system-level fault diagnosis algorithm and error recovery algorithms cooperate to obtain a set of consistent and error-free checkpoints. The integration of fault diagnosis and error recovery algorithms permits the implementation of an effective and comprehensive fault tolerance scheme for a wide variety of distributed systems and multicomputer networks
Keywords
computer network reliability; fault diagnosis; fault tolerant computing; multiprocessing systems; system recovery; application-transparent integrated error-recovery scheme; concurrent fault detection; distributed systems; error recovery algorithms; error-free checkpoints; fault tolerance scheme; integrated error-recovery scheme; multicomputer networks; multicomputers; processing node failures; Circuit faults; Clocks; Fault detection; Fault diagnosis; Fault tolerance; Fault tolerant systems; Routing protocols; Switching circuits; System recovery; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Signals, Systems and Computers, 1993. 1993 Conference Record of The Twenty-Seventh Asilomar Conference on
Conference_Location
Pacific Grove, CA
ISSN
1058-6393
Print_ISBN
0-8186-4120-7
Type
conf
DOI
10.1109/ACSSC.1993.342635
Filename
342635
Link To Document