Title :
Detection of crashed objects in eACID and its comparision with other techniques
Author :
Hussain, Shujaat ; Qadir, Muhammad Abdul
Author_Institution :
Center for Distrib. & Semantic Comput., Mohammad Ali Jinnah Univ., Islamabad, Pakistan
Abstract :
Failure detectors are one of the key components of fault tolerant distributed systems as it is very important to determine the suspected/crashed object and take recovery steps to keep the system going. The main objective of the fault monitoring activity is to quickly and correctly identify the faults. A fault monitoring system which is quick to declare faults increases the chances of false alarms, i.e., declaration of a fault which is actually not a fault. Therefore, an ideal fault monitoring system needs to be as quick as possible in identification of faults without increasing the false alarms. One of the major responsibilities of the monitor is to adapt these intervals according to the dynamic network and system conditions, and set them close to the actual delays in the system. The adaptation of the delays, timeout and monitoring intervals, must not fluctuate with large amplitudes around the actual delays. Otherwise, the number of false alarms would increase or the identification of faults will be delayed. Our algorithm with the name of eACID (enhanced Adaptive Convergent Intelligent fault monitoring in Distributed systems), when compared with the best known algorithm, ADAPTATION [Sotama et al.], yielded 16% less false timeouts and 9% more utilization of responses. eACID adapts the timeout on the previous history which gives us a fair idea about the work load and we use it to our advantage. Our scheme does not take decisions on transient behaviors of the system, moreover it has a threshold which is set after having intimate knowledge of past network behavior. These threshold depend on consecutive timeouts occurred/ If this threshold is crossed, it is declared dead.
Keywords :
distributed processing; object detection; software fault tolerance; crashed objects; eACID; enhanced adaptive convergent intelligent fault monitoring in distributed systems; fault monitoring activity; fault tolerant distributed systems; object detection; Computer crashes; Condition monitoring; Delay systems; Detectors; Distributed computing; Fault detection; Fault diagnosis; Fault tolerant systems; History; Object detection; adaptation; benchmark; fault detectors; fault monitoring; timeout;
Conference_Titel :
Multitopic Conference, 2009. INMIC 2009. IEEE 13th International
Conference_Location :
Islamabad
Print_ISBN :
978-1-4244-4872-2
Electronic_ISBN :
978-1-4244-4873-9
DOI :
10.1109/INMIC.2009.5383094