• DocumentCode
    63753
  • Title

    Dependability analysis for fault-tolerant computer systems using dynamic fault graphs

  • Author

    Zhao Feng ; Jin Hai ; Zou Deqing ; Qin Pan

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • Volume
    11
  • Issue
    9
  • fYear
    2014
  • fDate
    Sept. 2014
  • Firstpage
    16
  • Lastpage
    30
  • Abstract
    Dependability analysis is an important step in designing and analyzing safety computer systems and protection systems. Introducing multi-processor and virtual machine increases the system faults´ complexity, diversity and dynamic, in particular for software-induced failures, with an impact on the overall dependability. Moreover, it is very different for safety system to operate successfully at any active phase, since there is a huge difference in failure rate between hardware-induced and softwareinduced failures. To handle these difficulties and achieve accurate dependability evaluation, consistently reflecting the construct it measures, a new formalism derived from dynamic fault graphs (DFG) is developed in this paper. DFG exploits the concept of system event as fault state sequences to represent dynamic behaviors, which allows us to execute probabilistic measures at each timestamp when change occurs. The approach automatically combines the reliability analysis with the system dynamics. In this paper, we describe how to use the proposed methodology drives to the overall system dependability analysis through the phases of modeling, structural discovery and probability analysis, which is also discussed using an example of a virtual computing system.
  • Keywords
    fault tolerant computing; graph theory; probability; DFG; active phase; dynamic behavior representation; dynamic fault graphs; failure rate; fault state sequences; fault-tolerant computer systems; hardware-induced failures; modeling phase; multiprocessors; probabilistic measures; probability analysis phase; protection system analysis; protection system design; reliability analysis; safety computer system analysis; safety computer system design; safety system; software- induced failures; software-induced failures; structural discovery phase; system dependability analysis; system dynamics; system event; system fault complexity; system fault diversity; system fault dynamic; timestamp; virtual computing system; virtual machine; Computational modeling; Fault tolerance; Fault tolerant systems; Logic gates; Markov processes; Probabilistic logic; dependability analysis; dynamic fault-graph; fault-tolerant system; probability forecast; structural link;
  • fLanguage
    English
  • Journal_Title
    Communications, China
  • Publisher
    ieee
  • ISSN
    1673-5447
  • Type

    jour

  • DOI
    10.1109/CC.2014.6969708
  • Filename
    6969708