DocumentCode
844720
Title
Dependability measurement and modeling of a multicomputer system
Author
Tang, Dong ; Iyer, Ravishankar K.
Author_Institution
Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
Volume
42
Issue
1
fYear
1993
fDate
1/1/1993 12:00:00 AM
Firstpage
62
Lastpage
75
Abstract
A measurement-based analysis of error data collected from a DEC VAXcluster multicomputer system is presented. Basic system dependability characteristics such as error/failure distributions and hazard rate are obtained for both the individual machine and the entire VAXcluster. Markov reward models are developed to analyze error/failure behavior and to evaluate performance loss due to errors/failures. Correlation analysis is then performed to quantify relationships of error/failures across machines and across time. It is found that shared resources constitute a major reliability bottleneck. It is shown that for measured system, the homogeneous Markov model, which assumes constant failure rates, overestimates the transient reward rate for the short-term operation, and underestimates it for the long-term operation. Correlation analysis shows that errors are highly correlated across machines and across time. The failure correlation coefficient is low. However, its effect on system unavailability is significant
Keywords
fault tolerant computing; multiprocessing systems; performance evaluation; DEC VAXcluster; Markov reward models; correlation analysis; dependability measurement; error data; hazard rate; measurement-based analysis; modeling; multicomputer system; performance loss; system dependability characteristics; system unavailability; transient reward rate; Analytical models; Availability; Data analysis; Error analysis; Failure analysis; Hazards; Military computing; Performance analysis; Performance evaluation; Performance loss;
fLanguage
English
Journal_Title
Computers, IEEE Transactions on
Publisher
ieee
ISSN
0018-9340
Type
jour
DOI
10.1109/12.192214
Filename
192214
Link To Document