• DocumentCode
    844720
  • Title

    Dependability measurement and modeling of a multicomputer system

  • Author

    Tang, Dong ; Iyer, Ravishankar K.

  • Author_Institution
    Coordinated Sci. Lab., Illinois Univ., Urbana, IL, USA
  • Volume
    42
  • Issue
    1
  • fYear
    1993
  • fDate
    1/1/1993 12:00:00 AM
  • Firstpage
    62
  • Lastpage
    75
  • Abstract
    A measurement-based analysis of error data collected from a DEC VAXcluster multicomputer system is presented. Basic system dependability characteristics such as error/failure distributions and hazard rate are obtained for both the individual machine and the entire VAXcluster. Markov reward models are developed to analyze error/failure behavior and to evaluate performance loss due to errors/failures. Correlation analysis is then performed to quantify relationships of error/failures across machines and across time. It is found that shared resources constitute a major reliability bottleneck. It is shown that for measured system, the homogeneous Markov model, which assumes constant failure rates, overestimates the transient reward rate for the short-term operation, and underestimates it for the long-term operation. Correlation analysis shows that errors are highly correlated across machines and across time. The failure correlation coefficient is low. However, its effect on system unavailability is significant
  • Keywords
    fault tolerant computing; multiprocessing systems; performance evaluation; DEC VAXcluster; Markov reward models; correlation analysis; dependability measurement; error data; hazard rate; measurement-based analysis; modeling; multicomputer system; performance loss; system dependability characteristics; system unavailability; transient reward rate; Analytical models; Availability; Data analysis; Error analysis; Failure analysis; Hazards; Military computing; Performance analysis; Performance evaluation; Performance loss;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/12.192214
  • Filename
    192214