DocumentCode :
2333627
Title :
Low-cost flexible software fault tolerance for distributed computing
Author :
Tai, Ann T. ; Tso, Kam S. ; Sanders, William H. ; Alkalai, Leon ; Chau, Savio N.
Author_Institution :
IA Tech, Inc, Los Angeles, CA, USA
fYear :
2001
fDate :
27-30 Nov. 2001
Firstpage :
148
Lastpage :
157
Abstract :
The authors revisit the problem of software fault tolerance in distributed systems. In particular, we propose an extension of a message-driven confidence-driven (MDCD) protocol we have developed for error containment and recovery in a particular type of distributed embedded system. More specifically, we augment the original MDCD protocol by introducing the method of "fine-grained confidence adjustment," which enables us to remove the architectural restrictions. The dynamic nature of the MDCD approach gives it a number of desirable characteristics. First, this approach does not impose any restrictions on interactions among application software components or require costly message-exchange based process coordination/synchronization. Second, the algorithms allow redundancies to be applied only to low-confidence or critical interacting software components in a distributed system, permitting flexible realization of software fault tolerance. Finally, the dynamic error containment and recovery mechanisms are transparent to the application and ready to be implemented by generic middleware.
Keywords :
distributed algorithms; message passing; software fault tolerance; system recovery; MDCD protocol; application software components; architectural restrictions; critical interacting software components; distributed computing; distributed embedded system; distributed systems; dynamic error containment; dynamic nature; error recovery; fine grained confidence adjustment; generic middleware; low-cost flexible software fault tolerance; message-driven confidence-driven protocol; message-exchange based process coordination/synchronization; Aerodynamics; Application software; Distributed computing; Fault tolerance; Fault tolerant systems; Laboratories; Propulsion; Protocols; Redundancy; Space technology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Reliability Engineering, 2001. ISSRE 2001. Proceedings. 12th International Symposium on
ISSN :
1071-9458
Print_ISBN :
0-7695-1306-9
Type :
conf
DOI :
10.1109/ISSRE.2001.989468
Filename :
989468
Link To Document :
بازگشت