DocumentCode
2906948
Title
A Failure Detection Service for Internet-Based Multi-AS Distributed Systems
Author
Moraes, Dionei M. ; Duarte, Elias P.
Author_Institution
Dept. Inf., Fed. Univ. of Parana, Curitiba, Brazil
fYear
2011
fDate
7-9 Dec. 2011
Firstpage
260
Lastpage
267
Abstract
Failure detectors are one of the basic building blocks of fault-tolerant distributed systems. A failure detector is a distributed oracle that provides information about the state of processes of a distributed system. This work presents a failure detector service for Internet-based distributed systems that span multiple autonomous systems. The service is based on monitors which are capable of providing global process state information through a SNMP interface. A monitor executes on each network where processes are monitored. Monitors at different networks communicate across the Internet using Web Services. The system was implemented and evaluated for monitored processes running both at a single LAN and distributed throughout the world in Planet Lab. Experimental results are presented, showing CPU usage, failure detection latency, and mistake rate.
Keywords
Web services; fault tolerant computing; local area networks; system recovery; CPU usage; Internet-based distributed systems; Internet-based multiAS distributed systems; LAN; Planet Lab; SNMP interface; Web services; autonomous systems; failure detection latency; failure detection service; failure detector service; fault-tolerant distributed systems; global process state information; Biomedical monitoring; Computer crashes; Detectors; Heart beat; Local area networks; Monitoring; Web services; Distributed Systems Management; Failure Detectors; Multi-AS Internet Systems; Process Management;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on
Conference_Location
Tainan
ISSN
1521-9097
Print_ISBN
978-1-4577-1875-5
Type
conf
DOI
10.1109/ICPADS.2011.5
Filename
6121286
Link To Document