Title :
Dependability analysis of fault-tolerant multiprocessor systems by probabilistic simulation
Author :
Danilenko, Ivan ; Dmitrieva, Elena ; Tsapko, Gennadij
Author_Institution :
Tomsk Polytech. Univ., Russia
fDate :
26 Jun-3 Jul 2001
Abstract :
The objective of this research is to develop a new approach for evaluating the dependability of fault-tolerant computer systems. Dependability has traditionally been evaluated through combinatorial and Markov modelling. These analytical techniques have several limitations, which can restrict their applicability. Simulation avoids many of the limitations, allowing for more precise representation of system attributes than feasible with analytical modelling. However, the computational demands of simulating a system in detail, at a low abstraction level, currently prohibit evaluation of high-level dependability metrics such as reliability and availability. The new approach abstracts a system at the architectural level, and employs life testing through simulated fault-injection to accurately and efficiently measure dependability. The simulation models needed to implement this approach are derived, in part, from the published results of computer performance studies and low-level fault-injection experiments. The developed probabilistic models of processor, memory and fault-tolerant mechanisms take such properties of real systems, as error propagation, different modes of failures, event dependency and concurrency. They have been integrated with a workload model and statistical analysis module into a generalised software tool. The effectiveness of such an approach was demonstrated through the analysis of several multiprocessor architectures
Keywords :
fault tolerant computing; multiprocessing systems; performance evaluation; probability; statistical analysis; virtual machines; computer performance; error propagation; experiments; fault tolerant computer systems; fault-injection; life testing; multiprocessor systems; probabilistic models; simulation; statistical analysis; system dependability; workload model; Abstracts; Analytical models; Availability; Computational modeling; Computer performance; Computer simulation; Fault tolerant systems; Life testing; Mechanical factors; Multiprocessing systems;
Conference_Titel :
Science and Technology, 2001. KORUS '01. Proceedings. The Fifth Russian-Korean International Symposium on
Conference_Location :
Tomsk
Print_ISBN :
0-7803-7008-2
DOI :
10.1109/KORUS.2001.975079