Title :
Action models: a reliability modeling formalism for fault-tolerant distributed computing systems
Author :
Van Moorsel, Aad P A
Author_Institution :
Distributed Software Res. Dept., AT&T Bell Labs., Murray Hill, NJ, USA
Abstract :
Modern-day computing system design and development is characterized by increasing system complexity and ever shortening time to market. For modeling techniques to be deployed successfully, they must conveniently deal with complex system models, and must be quick and easy to use by non-specialists. In this paper we introduce “action models”, a modeling formalism that tries to achieve the above goals for reliability evaluation of fault-tolerant distributed computing systems, including both software and hardware in the analysis. The metric of interest in action models is the job success probability, and we will argue why the traditional availability metric is insufficient for the evaluation of fault-tolerant distributed systems. We formally specify action models, and introduce path-based solution algorithms to deal with the potential solution complexity of created models. In addition, we show several examples of action models, and use a preliminary tool implementation to obtain reliability results for a reliable clustered computing platform
Keywords :
computational complexity; distributed processing; fault tolerant computing; action models; fault-tolerant distributed computing systems; path-based solution; reliability evaluation; reliability modeling formalism; reliable clustered computing platform; system complexity; Availability; Clustering algorithms; Distributed computing; Fault tolerant systems; Fault trees; Hardware; Petri nets; Stochastic systems; Telecommunication switching; Time to market;
Conference_Titel :
Computer Performance and Dependability Symposium, 1998. IPDS '98. Proceedings. IEEE International
Conference_Location :
Durham, NC
Print_ISBN :
0-8186-8679-0
DOI :
10.1109/IPDS.1998.707715