DocumentCode :
2609974
Title :
A fault detection service for wide area distributed computations
Author :
Stelling, Paul ; Foster, Ian ; Kesselman, Carl ; Lee, Craig ; Von Laszewski, Gregor
Author_Institution :
Aerosp. Corp., El Segundo, CA, USA
fYear :
1998
fDate :
28-31 Jul 1998
Firstpage :
268
Lastpage :
278
Abstract :
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and correcting faults, the implementation of these techniques in a particular context can be difficult. Hence, we propose a fault detection service designed to be incorporated, in a modular fashion, into distributed computing systems, tools, or applications. This service uses well-known techniques based on unreliable fault detectors to detect and report component failure, while allowing the user to tradeoff timeliness of reporting against false positive rates. We describe the architecture of this service, report on experimental results that quantify its cost and accuracy, and describe its use in two applications, monitoring the status of system components of the GUSTO computational grid testbed and as part of the NetSolve network-enabled numerical solver
Keywords :
computer network reliability; fault diagnosis; system monitoring; wide area networks; GUSTO; NetSolve; application developers; component failure; computational grid testbed; experimental results; fault correction; fault detection service; network-enabled numerical solver; wide area distributed computations; Application software; Computer networks; Computer science; Costs; Distributed computing; Fault detection; Grid computing; Laboratories; Mathematics; Resource management;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Distributed Computing, 1998. Proceedings. The Seventh International Symposium on
Conference_Location :
Chicago, IL
ISSN :
1082-8907
Print_ISBN :
0-8186-8579-4
Type :
conf
DOI :
10.1109/HPDC.1998.709981
Filename :
709981
Link To Document :
بازگشت