Title :
Definition and specification of accrual failure detectors
Author :
Défago, Xavier ; Urbán, Péter ; Hayashibara, Naohiro ; Katayama, Takuya
Author_Institution :
Sch. of Inf. Sci., Japan Adv. Inst. of Sci. & Technol., Ishikawa, Japan
fDate :
28 June-1 July 2005
Abstract :
For many years, people have been advocating the development of failure detection as a basic service, but, unfortunately, without meeting much success so far. We believe that this comes from the fact that important system engineering issues have not yet been addressed adequately, thus preventing the definition of a truly generic service. Ultimately, our goal is to define a service that is both simple and expressive, yet powerful enough to support the requirements of many distributed applications. To this end, we consider an alternative interaction model between the service and the applications, called accrual failure detectors. Roughly, an accrual failure detector associates to each process a real value representing a suspicion level, instead of the traditional binary information (i.e., trust vs. suspect). In this paper, we provide a rigorous definition for accrual failure detectors, demonstrate that changing the interaction model leads to no loss in computational power, discuss quality of service issues, and present several possible implementations.
Keywords :
distributed processing; fault diagnosis; fault tolerant computing; quality of service; accrual failure detector; distributed application; quality of service; system engineering; Detectors; Educational programs; Educational technology; Information science; Large-scale systems; Power engineering and energy; Power system modeling; Power system reliability; Quality of service; Systems engineering and theory;
Conference_Titel :
Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference on
Print_ISBN :
0-7695-2282-3
DOI :
10.1109/DSN.2005.37