DocumentCode :
3171236
Title :
Failure detectors as first class objects
Author :
Felber, Pascal ; Défago, Xavier ; Guerraoui, Rachid ; Oser, Philipp
Author_Institution :
Oper. Syst. Lab., Fed. Inst. of Technol., Lausanne, Switzerland
fYear :
1999
fDate :
1999
Firstpage :
132
Lastpage :
141
Abstract :
One of the fundamental differences between a centralized system and a distributed one is the notion of partial failures. The ability to efficiently and accurately detect failures is a key element underlying reliable distributed computing. In current distributed systems, however, failure detection is either left to the application developer or hidden from the programmer and provided in an ad-hoc manner behind the scenes. We plead for an intermediate approach where failure detectors are first-class objects. We view failure detection as an abstraction, the complexity of which is encapsulated behind well-defined interfaces. The various roles of a failure detection service are all represented as first-class objects. Following our approach, one can reuse existing failure detection protocols as they are, or, through composition or refinement, one can define new protocols that match the application requirements. We describe an interesting result of a composition that mixes push and pull failure monitoring, and we show how scalability issues may be addressed by using a hierarchical failure detection configuration. We also discuss the implementation of our failure service both in CORBA and in Java
Keywords :
Java; distributed object management; error detection; failure analysis; protocols; system monitoring; system recovery; CORBA; Java; abstraction; application requirements; complexity; composition; distributed computing; distributed system; failure detection protocol reuse; failure detection service; failure detectors; first-class objects; hierarchical failure detection configuration; partial failures; push-pull failure monitoring; refinement; scalability; well-defined interfaces; Condition monitoring; Detectors; Layout; Load management; Network topology; Object detection; Operating systems; Programming profession; Protocols; Read only memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Objects and Applications, 1999. Proceedings of the International Symposium on
Conference_Location :
Edinburgh
Print_ISBN :
0-7695-0182-6
Type :
conf
DOI :
10.1109/DOA.1999.794001
Filename :
794001
Link To Document :
بازگشت