DocumentCode :
390040
Title :
Failure detectors for large-scale distributed systems
Author :
Hayashibara, Naohiro ; Cherif, Adel ; Katayama, Takuya
Author_Institution :
Graduate Sch. of Inf. Sci., JAIST, Ishikawa, Japan
fYear :
2002
fDate :
2002
Firstpage :
404
Lastpage :
409
Abstract :
This paper discusses the problem of implementing a scalable failure detection service for grid systems. More specifically, traditional implementations of failure detectors are often tuned for running over local networks and fail to address important problems found in wide-area distributed systems, such as grid systems. We identify some of the most important problems raised in the context of grids. We then survey recent propositions that can help in solving some of these problems.
Keywords :
computer network reliability; fault tolerant computing; wide area networks; failure detectors; grid systems; large-scale distributed systems; scalable failure detection service; wide-area distributed systems; Computer crashes; Computer networks; Computerized monitoring; Condition monitoring; Detectors; Distributed computing; Grid computing; Information science; Large-scale systems; Protocols;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliable Distributed Systems, 2002. Proceedings. 21st IEEE Symposium on
ISSN :
1060-9857
Print_ISBN :
0-7695-1659-9
Type :
conf
DOI :
10.1109/RELDIS.2002.1180218
Filename :
1180218
Link To Document :
بازگشت