DocumentCode
2915470
Title
Extending an SSI Cluster for Resource Discovery in Grid Computing
Author
Echaiz, Javier ; Ardenghi, Jorge
Author_Institution
Departamento de Ciencias e Ingenieria de la Computacion, Univ. Nacional del Sur, Bahia Blanca
fYear
2006
fDate
Oct. 2006
Firstpage
287
Lastpage
293
Abstract
Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations can be challenging due to the considerable diversity, large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Hence, information services are a vital part of any grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and thus for planning and adapting application behavior. This paper proposes a resource discovery system for grid computing with fault-tolerant capabilities starting from an SSI clustering operating system. The proposed system uses dynamic leader-determination and registration mechanisms to automatically recover from nodes and network failures. The system is centralized and uses dynamic (or soft-state) registration to detect and recover from failures. Provisional or backup leader determination provides tolerance and recovery in the event of the leader node failing. The system was tested against a control network modeled after existing grid computing resource discovery components, such as Globus monitoring and discovery system (MDS). In various failure scenarios, the proposed system showed better resilience and performance than the control system
Keywords
fault tolerant computing; grid computing; information services; operating systems (computers); Globus monitoring and discovery system; SSI clustering operating system; dynamic leader-determination; fault tolerance; grid computing; grid software infrastructure; high performance computing; information service; large-scale resource sharing; registration mechanism; resource discovery; Application software; Automatic control; Condition monitoring; Distributed computing; Fault tolerant systems; Grid computing; Large-scale systems; Operating systems; Resilience; System testing; fault tolerance; grid operating systems; high performance computing.; resource discovery;
fLanguage
English
Publisher
ieee
Conference_Titel
Grid and Cooperative Computing, 2006. GCC 2006. Fifth International Conference
Conference_Location
Hunan
Print_ISBN
0-7695-2694-2
Type
conf
DOI
10.1109/GCC.2006.43
Filename
4031470
Link To Document