Title :
Primary-shadow consistency issues in the DRB scheme and the recovery time bound
Author :
Kim, K.H. ; Bacellar, Luiz ; Subbaraman, Chittur
Author_Institution :
Dept. of Electr. & Comput. Eng., California Univ., Irvine, CA, USA
fDate :
30 Oct-2 Nov 1996
Abstract :
The distributed recovery block (DRB) scheme is an approach for realizing both hardware and software fault tolerance in real time distributed and parallel computer systems. We point out that in order for the DRB scheme to yield a high fault coverage and a low recovery time bound, some important consistency requirements must be satisfied by the replicated application tasks in a DRB computing station. Newly identified approaches for meeting the consistency requirements, which involve, among other things, integration of network surveillance and reconfiguration (NSR) techniques with the DRB scheme, are presented. The paper then presents an analysis of the recovery time bound of the DRB scheme. The analysis is based on a modular structured concrete implementation model of the DRB scheme for local area network (LAN) based distributed computer systems, which is called the DRB/T LAN scheme and incorporates an NSR scheme and the newly identified consistency ensuring mechanisms. Finally, we consider approaches for applying the DRB scheme to new types of application computation segments that were not considered before and then discuss approaches for meeting the consistency requirements in such DRB stations. These approaches broaden the application range of the DRB scheme significantly
Keywords :
computer network reliability; fault tolerant computing; local area networks; parallel machines; parallel programming; real-time systems; reliability; software fault tolerance; system recovery; DRB scheme; DRB/T LAN scheme; application computation segments; consistency ensuring mechanisms; consistency requirements; distributed recovery block scheme; fault coverage; local area network based distributed computer systems; modular structured concrete implementation model; network surveillance and reconfiguration; parallel computer systems; primary shadow consistency issues; real time distributed systems; recovery time bound; replicated application tasks; software fault tolerance; Application software; Computer networks; Concrete; Concurrent computing; Distributed computing; Fault tolerant systems; Hardware; Local area networks; Real time systems; Surveillance;
Conference_Titel :
Software Reliability Engineering, 1996. Proceedings., Seventh International Symposium on
Conference_Location :
White Plains, NY
Print_ISBN :
0-8186-7707-4
DOI :
10.1109/ISSRE.1996.558888