DocumentCode :
327286
Title :
Low-cost fault-tolerance in barrier synchronizations
Author :
Kulkarni, Sandeep S. ; Arora, Anish
Author_Institution :
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
fYear :
1998
fDate :
10-14 Aug 1998
Firstpage :
132
Lastpage :
139
Abstract :
We show how fault-tolerance can be effectively added to several types of faults in program computations that use barrier synchronization. We divide the faults that occur in practice into two classes, detectable and undetectable, and design a fully distributed program that tolerates the faults in both classes. Our program guarantees that every barrier is executed correctly even if detectable faults occur, and that eventually every barrier is executed correctly even if undetectable faults occur. Via analytical as well as simulation results we show that the cost of adding fault-tolerance is low, in part by comparing the times required by our program with that required by the corresponding fault-intolerant counterpart
Keywords :
message passing; parallel algorithms; parallel programming; software fault tolerance; synchronisation; barrier synchronizations; detectable faults; distributed program; low cost fault tolerance; message passing; program computations; simulation; undetectable faults; Analytical models; Communication standards; Concurrent computing; Costs; Fault detection; Fault tolerance; Information science; Message passing; Parallel algorithms; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing, 1998. Proceedings. 1998 International Conference on
Conference_Location :
Minneapolis, MN
ISSN :
0190-3918
Print_ISBN :
0-8186-8650-2
Type :
conf
DOI :
10.1109/ICPP.1998.708472
Filename :
708472
Link To Document :
بازگشت