Title :
Simulator for fault tolerance in large scale distributed systems
Author :
Boteanu, Adrian ; Dobre, Ciprian ; Pop, Florin ; Cristea, Valentin
Author_Institution :
Univ. Politeh. of Bucharest, Bucharest, Romania
Abstract :
We present a simulation model designed for the evaluation of fault tolerance solutions working in large scale distributed systems. This model extends the MONARC simulation model with new capabilities for fault tolerance simulation. The model includes failure behavior and capabilities to detect and react to faults. We also present an implementation of this model in MONARC, together with specific evaluation results. The model´s implementation considers permanent and transient failures occurring within processing units, network components, as well as databases. The model is easily extendable, allowing the additions of new failure models, as required by different experiments. The model can be used in conjunction with key performance metrics, being able to easily pinpoint areas of failures within the simulated environments.
Keywords :
discrete event simulation; distributed processing; fault tolerant computing; large-scale systems; MONARC simulation model; fault tolerance simulator; key performance metric; large scale distributed system; Adaptation model; Analytical models; Biological system modeling; Computational modeling; Data models; Fault tolerance; Fault tolerant systems; distributed systems; fault tolerance; faults; performance analysis; simulation model;
Conference_Titel :
Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on
Conference_Location :
Cluj-Napoca
Print_ISBN :
978-1-4244-8228-3
Electronic_ISBN :
978-1-4244-8230-6
DOI :
10.1109/ICCP.2010.5606401