مرکز منطقه ای اطلاع رساني علوم و فناوري - On the implementation of unreliable failure detectors in partially synchronous systems

DocumentCode :

1053808

Title :

On the implementation of unreliable failure detectors in partially synchronous systems

Author :

Larrea, Mikel ; Fernández, Antonio ; Arévalo, Sergio

Author_Institution :

Dept. de Arquitectura y Tecnologia de Comput., Pais Vasco Univ., San Sebastian, Spain

Volume :

Issue :

fYear :

2004

fDate :

7/1/2004 12:00:00 AM

Firstpage :

815

Lastpage :

828

Abstract :

Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide information about process failures. Chandra and Toueg defined eight classes of failure detectors, depending on how accurate this information is, and presented an algorithm implementing a failure detector of one of these classes in a partially synchronous system. This algorithm is based on all-to-all communication and periodically exchanges a number of messages that is quadratic on the number of processes. We study the implementability of different classes of failure detectors in several models of partial synchrony. We first show that no failure detector with perpetual accuracy (namely, P, Q, S, and W) can be implemented in these models in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of the class θ proposed by Aguilera et al. Then, we present a family of distributed algorithms that implement the four classes of unreliable failure detectors with eventual accuracy (namely, ◊P, ◊Q, ◊S, and ◊W). Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages.

Keywords :

computational complexity; distributed algorithms; system recovery; all-to-all communication; consensus problem; crash failures; distributed algorithms; distributed systems; failure information propagation pattern; logical ring arrangement; partially synchronous system; process failures; unreliable failure detector; Algorithm design and analysis; Computational modeling; Computer crashes; Condition monitoring; Detectors; Distributed algorithms; Distributed computing; Helium; Synchronous generators; Timing; 65; Consensus problem; crash failures; distributed systems; failure detection; partial synchrony; unreliable failure detectors.;

fLanguage :

English

Journal_Title :

Computers, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9340

Type :

jour

DOI :

10.1109/TC.2004.33

Filename :

1321043

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1053808