DocumentCode
327286
Title
Low-cost fault-tolerance in barrier synchronizations
Author
Kulkarni, Sandeep S. ; Arora, Anish
Author_Institution
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
fYear
1998
fDate
10-14 Aug 1998
Firstpage
132
Lastpage
139
Abstract
We show how fault-tolerance can be effectively added to several types of faults in program computations that use barrier synchronization. We divide the faults that occur in practice into two classes, detectable and undetectable, and design a fully distributed program that tolerates the faults in both classes. Our program guarantees that every barrier is executed correctly even if detectable faults occur, and that eventually every barrier is executed correctly even if undetectable faults occur. Via analytical as well as simulation results we show that the cost of adding fault-tolerance is low, in part by comparing the times required by our program with that required by the corresponding fault-intolerant counterpart
Keywords
message passing; parallel algorithms; parallel programming; software fault tolerance; synchronisation; barrier synchronizations; detectable faults; distributed program; low cost fault tolerance; message passing; program computations; simulation; undetectable faults; Analytical models; Communication standards; Concurrent computing; Costs; Fault detection; Fault tolerance; Information science; Message passing; Parallel algorithms; Workstations;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 1998. Proceedings. 1998 International Conference on
Conference_Location
Minneapolis, MN
ISSN
0190-3918
Print_ISBN
0-8186-8650-2
Type
conf
DOI
10.1109/ICPP.1998.708472
Filename
708472
Link To Document