• DocumentCode
    2943664
  • Title

    Application Cluster Service Scheme for Near-Zero-Downtime Services

  • Author

    Cheng, Fan-tien ; Wu, Shang-Lun ; Tsai, Ping-Yen ; Chung, Yun-Ta ; Yang, Haw-Ching

  • Author_Institution
    Institute of Manufacturing Engineering National Cheng Kung University Tainan, Taiwan, R.O.C., e-mail: chengft@mail.ncku.edu.tw
  • fYear
    2005
  • fDate
    18-22 April 2005
  • Firstpage
    4062
  • Lastpage
    4067
  • Abstract
    The required reliability in applications of a distributed computer system is continuous service for 24 hours a day, 7 days a week. However, computer failures due to exhaustion of operating system resources, data corruption, numerical error accumulation, and so on, may interrupt services and cause significant losses. Hence, this work proposes an application cluster service (APCS) scheme. The proposed APCS provides both a failover scheme and a state recovery scheme for failure management. The failover scheme is designed mainly to automatically activate the backup application for replacing the failed application whenever it is sick or down. Meanwhile, the state recovery scheme is intended primarily to provide an inheritable design pattern to support applications with state recovery requirements. An application simply needs to inherit and implement this design pattern, and then can accomplish the task of state backup and recovery. Furthermore, a performance evaluator (PEV) that can detect performance degradation and predict time to failure is developed in this study. By using these detection and prediction capabilities, the APCS can perform the failover process before node breakdown. Thus, applying APCS and PEV can enable a distributed computer system to provide services with near-zero-downtime.
  • Keywords
    Application cluster service (APCS); failover scheme; near-zero-downtime service; performance evaluator (PEV); state recovery scheme; Application software; Availability; Computer errors; Control engineering; Degradation; Distributed computing; Manufacturing; Middleware; Operating systems; Reliability engineering; Application cluster service (APCS); failover scheme; near-zero-downtime service; performance evaluator (PEV); state recovery scheme;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-8914-X
  • Type

    conf

  • DOI
    10.1109/ROBOT.2005.1570743
  • Filename
    1570743