DocumentCode
330834
Title
Reliability analysis of clustered computing systems
Author
Mendiratta, Veena B.
Author_Institution
AT&T Bell Labs., Naperville, IL, USA
fYear
1998
fDate
4-7 Nov 1998
Firstpage
268
Lastpage
272
Abstract
Clustered computing systems, using commercially available computers networked in a loosely-coupled fashion, can provide high levels of reliability if appropriate levels of error detection and recovery software are implemented in the middleware and application layers. In this paper, we present a modeling approach for analyzing the hardware and software reliability of clustered computing systems. The clustered system is modeled as an irreducible Markov chain with working and failed states, and intermediate recovery states. The failure and recovery behavior is characterized in terms of the frequency and duration of fault recoveries and outages for a single processor in the cluster and for the entire clustered system. We apply the model to a telecommunication switching system application that uses the Lucent Technologies Reliable Clustered Computing product. The model results are presented for a range of values of the processor failure rate and the fault recovery coverage factor
Keywords
Markov processes; client-server systems; computer network reliability; electronic switching systems; error detection; software reliability; switching networks; system recovery; telecommunication computing; workstation clusters; Lucent Technologies Reliable Clustered Computing product; application layers; clustered computing systems reliability; commercially available computers; error detection; error recovery software; failed states; failure behavior; fault recovery behavior; fault recovery coverage factor; hardware reliability; intermediate recovery states; irreducible Markov chain; loosely-coupled computer network; middleware; modeling; networked computers; outages; processor failure rate; software reliability; telecommunication switching system; working states; Application software; Computer errors; Computer network reliability; Computer networks; Frequency; Hardware; Middleware; Software reliability; Telecommunication computing; Telecommunication switching;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Reliability Engineering, 1998. Proceedings. The Ninth International Symposium on
Conference_Location
Paderborn
ISSN
1071-9458
Print_ISBN
0-8186-8991-9
Type
conf
DOI
10.1109/ISSRE.1998.730890
Filename
730890
Link To Document