• DocumentCode
    327286
  • Title

    Low-cost fault-tolerance in barrier synchronizations

  • Author

    Kulkarni, Sandeep S. ; Arora, Anish

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
  • fYear
    1998
  • fDate
    10-14 Aug 1998
  • Firstpage
    132
  • Lastpage
    139
  • Abstract
    We show how fault-tolerance can be effectively added to several types of faults in program computations that use barrier synchronization. We divide the faults that occur in practice into two classes, detectable and undetectable, and design a fully distributed program that tolerates the faults in both classes. Our program guarantees that every barrier is executed correctly even if detectable faults occur, and that eventually every barrier is executed correctly even if undetectable faults occur. Via analytical as well as simulation results we show that the cost of adding fault-tolerance is low, in part by comparing the times required by our program with that required by the corresponding fault-intolerant counterpart
  • Keywords
    message passing; parallel algorithms; parallel programming; software fault tolerance; synchronisation; barrier synchronizations; detectable faults; distributed program; low cost fault tolerance; message passing; program computations; simulation; undetectable faults; Analytical models; Communication standards; Concurrent computing; Costs; Fault detection; Fault tolerance; Information science; Message passing; Parallel algorithms; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 1998. Proceedings. 1998 International Conference on
  • Conference_Location
    Minneapolis, MN
  • ISSN
    0190-3918
  • Print_ISBN
    0-8186-8650-2
  • Type

    conf

  • DOI
    10.1109/ICPP.1998.708472
  • Filename
    708472