DocumentCode :
3507059
Title :
Using a Failure History Service for Reliable Grid Node Information
Author :
Leordeanu, Catalin ; Cristea, Valentin ; Ropars, Thomas ; Jegou, Yvon ; Morin, Christine
Author_Institution :
Fac. of Autom. Control & Comput., Univ. Politeh. of Bucharest, Bucharest, Romania
fYear :
2010
fDate :
4-6 Nov. 2010
Firstpage :
37
Lastpage :
44
Abstract :
The need for reliability in Grid Systems is a difficult challenge which is very important in the context of highly dynamic systems composed of thousands of nodes. Failure management is a key component in the attempt to provide such a reliable environment. The proposed approach is based on the existence of accurate failure information about the nodes in the Grid which is very difficult in large scale systems. This paper proposes a failure history service used to share failure information which is critical to the management of resources in large scale distributed systems, thus improving the overall reliability. This novel service ensures that the information about the current state of a node, as well as its failure history, is as accurate as possible even when facing a large number of node failures. This solution aims to increase the reliability of Grid systems by providing accurate data which can be used to analyze failures over time.
Keywords :
grid computing; software fault tolerance; failure history service; failure management; grid systems; node failures; reliable grid node information; failure detection; failure history; grid; pastry; vigne;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2010 International Conference on
Conference_Location :
Fukuoka
Print_ISBN :
978-1-4244-8538-3
Electronic_ISBN :
978-0-7695-4237-9
Type :
conf
DOI :
10.1109/3PGCIC.2010.11
Filename :
5662745
Link To Document :
بازگشت