• DocumentCode
    1566047
  • Title

    A Framework for Proactive Fault Tolerance

  • Author

    Vallée, Geoffroy ; Engelmann, Christian ; Tikotekar, Anand ; Naughton, Thomas ; Charoenpornwattana, Kulathep ; Leangsuksun, Chokchai ; Scott, Stephen L.

  • Author_Institution
    Oak Ridge Nat. Lab., Oak Ridge, TN
  • fYear
    2008
  • Firstpage
    659
  • Lastpage
    664
  • Abstract
    Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/restart or duplication. However it is also possible to anticipate failures and proactively take action before failures occur in order to minimize failure impact on the system and application execution. This document presents a proactive fault tolerance framework. This framework can use different proactive fault tolerance mechanisms, i.e., migration and pause/un-pause. The framework also allows the implementation of new proactive fault tolerance policies thanks to a modular architecture. A first proactive fault tolerance policy has been implemented and preliminary experimentations have been done based on system-level virtualization and compared with results obtained by simulation.
  • Keywords
    fault tolerance; system recovery; checkpoint/restart approach; failure impact minimization; modular architecture; proactive fault tolerance policy; system application execution; Availability; Communication system control; Computer applications; Fault tolerance; Fault tolerant systems; Laboratories; Large-scale systems; Monitoring; National security; Prototypes; adaptation; clustering; proactive fault tolerance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Availability, Reliability and Security, 2008. ARES 08. Third International Conference on
  • Conference_Location
    Barcelona
  • Print_ISBN
    978-0-7695-3102-1
  • Type

    conf

  • DOI
    10.1109/ARES.2008.171
  • Filename
    4529406