Title :
Improving the Quality of Service of Fault Detection in Distributed Platforms under Adverse Network Conditions
Author :
Lemos, Fernando Tarlá Cardoso ; Sato, Liria Matsumoto
Abstract :
Fault detection is core functionality required by most fault tolerance strategies, but it often depends on reliable communication between computing nodes exchanging monitoring information. We present techniques to improve the robustness of fault detectors for distributed platforms in situations where network connectivity is affected by packet loss and delays. Similar network conditions can be found in computing grids connecting geographically distant resources. We present results from experimental tests conducted in a simulated environment. The results show significant improvement over traditional approaches.
Keywords :
digital simulation; grid computing; quality of service; software fault tolerance; software reliability; adverse network conditions; computing grids; computing nodes; delays; distributed platforms; fault detection; fault tolerance strategies; geographically distant resources; monitoring information; network connectivity; packet loss; quality of service; reliable communication; simulated environment; Biomedical monitoring; Computational modeling; Detectors; Heart beat; Monitoring; Payloads; Software; Distributed Failure Detectors; Failure Detection; Fault Tolerance;
Conference_Titel :
Computer Systems (WSCAD-SSC), 2012 13th Symposium on
Conference_Location :
Petropolis
Print_ISBN :
978-1-4673-4468-5
DOI :
10.1109/WSCAD-SSC.2012.25