• DocumentCode
    2852341
  • Title

    A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grids

  • Author

    Gorde, Nitin B. ; Aggarwal, Sanjeev K.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., IIT Kanpur, Kanpur
  • fYear
    2008
  • fDate
    8-12 Sept. 2008
  • Firstpage
    53
  • Lastpage
    58
  • Abstract
    In dynamic grid environment failures (e.g. link down, resource failures) are frequent. We present a fault tolerance scheme for hierarchical dynamic scheduler (HDS) for grid workflow applications. In HDS all resources are arranged in a hierarchy tree and each resource acts as a scheduler. The fault tolerance scheme is fully distributed and is responsible for maintaining the hierarchy tree in the presence of failures. Our fault tolerance scheme handles root failures specially, which avoids root becoming single point of failure. The resources detecting failures are responsible for taking appropriate actions. Our fault tolerance scheme uses randomization to get rid of multiple simultaneous failures. Our simulation results show that the recovery process is fast and the failures affect minimally to the scheduling process.
  • Keywords
    grid computing; scheduling; software fault tolerance; system recovery; fault tolerance; grid environment failures; grid workflow; hierarchical dynamic schedulers; recovery process; Concrete; Delay; Dynamic scheduling; Fault detection; Fault tolerance; Heart beat; Portals; Processor scheduling; Scheduling algorithm; Tree data structures; Fault tolerance scheme; Grid; schedulers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing - Workshops, 2008. ICPP-W '08. International Conference on
  • Conference_Location
    Portland, OR
  • ISSN
    1530-2016
  • Print_ISBN
    978-0-7695-3375-9
  • Electronic_ISBN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPP-W.2008.7
  • Filename
    4626780