Title :
Improving the Robustness of Distributed Failure Detectors in Adverse Conditions
Author :
Lemos, F.T.C. ; Sato, L.M.
Author_Institution :
Univ. de Sao Paulo (USP), Sao Paulo, Brazil
Abstract :
Failure detection is at the core of most fault tolerance strategies, but it often depends on reliable communication. We present new algorithms for failure detectors which are appropriate as components of a fault tolerance system that can be deployed in situations of adverse network conditions (such as loosely connected and administered computing grids). It packs redundancy into heartbeat messages, thereby improving on the robustness of the traditional protocols. Results from experimental tests conducted in a simulated environment with adverse network conditions show significant improvement over existing solutions.
Keywords :
protocols; telecommunication network reliability; adverse network conditions; distributed failure detectors; fault tolerance strategies; heartbeat messages; protocols; reliable communication; Biomedical monitoring; Detectors; Fault tolerance; Heart beat; Monitoring; Payloads; Robustness; Distributed Failure Detectors; Failure Detection; Fault Tolerance;
Journal_Title :
Latin America Transactions, IEEE (Revista IEEE America Latina)
DOI :
10.1109/TLA.2012.6142485