• DocumentCode
    2915470
  • Title

    Extending an SSI Cluster for Resource Discovery in Grid Computing

  • Author

    Echaiz, Javier ; Ardenghi, Jorge

  • Author_Institution
    Departamento de Ciencias e Ingenieria de la Computacion, Univ. Nacional del Sur, Bahia Blanca
  • fYear
    2006
  • fDate
    Oct. 2006
  • Firstpage
    287
  • Lastpage
    293
  • Abstract
    Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations can be challenging due to the considerable diversity, large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Hence, information services are a vital part of any grid software infrastructure, providing fundamental mechanisms for discovery and monitoring, and thus for planning and adapting application behavior. This paper proposes a resource discovery system for grid computing with fault-tolerant capabilities starting from an SSI clustering operating system. The proposed system uses dynamic leader-determination and registration mechanisms to automatically recover from nodes and network failures. The system is centralized and uses dynamic (or soft-state) registration to detect and recover from failures. Provisional or backup leader determination provides tolerance and recovery in the event of the leader node failing. The system was tested against a control network modeled after existing grid computing resource discovery components, such as Globus monitoring and discovery system (MDS). In various failure scenarios, the proposed system showed better resilience and performance than the control system
  • Keywords
    fault tolerant computing; grid computing; information services; operating systems (computers); Globus monitoring and discovery system; SSI clustering operating system; dynamic leader-determination; fault tolerance; grid computing; grid software infrastructure; high performance computing; information service; large-scale resource sharing; registration mechanism; resource discovery; Application software; Automatic control; Condition monitoring; Distributed computing; Fault tolerant systems; Grid computing; Large-scale systems; Operating systems; Resilience; System testing; fault tolerance; grid operating systems; high performance computing.; resource discovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid and Cooperative Computing, 2006. GCC 2006. Fifth International Conference
  • Conference_Location
    Hunan
  • Print_ISBN
    0-7695-2694-2
  • Type

    conf

  • DOI
    10.1109/GCC.2006.43
  • Filename
    4031470