Title :
Specifying fault tolerance in large complex computing systems
Author :
Hugue, Michelle M. ; Scalzo, Richard C.
Author_Institution :
Opsimath Res., Bowie, MD, USA
Abstract :
A difficult task in the process of engineering large complex computing systems is the derivation of requirements which assure that a system can supply the expected behavior the presence of faults. This paper provides insight into the constraints placed upon the system during the requirements specification phase of the system life-cycle by quality of service requirements, such as performance, reliability, availability, safety, and timeliness. Focusing on a simple environmental control system, we examine the derivation of requirements to support system resiliency to faults. After partitioning possible system behaviors into acceptable and unacceptable sets, we provide guidelines for specifying acceptable behavior using the following process: define the normal behavior for a correct system; identify the fault hypothesis appropriate to the system and applications; and, define the exceptional behavior for a partially correct system satisfying the system fault hypothesis. We motivate the specification of a health management function to assure acceptable behavior to the extent indicated by the quality of service requirements and the dependability constraints of availability, reliability, safety, integrity, confidentiality, and maintainability
Keywords :
fault tolerant computing; formal specification; real-time systems; software performance evaluation; software reliability; availability; confidentiality; dependability constraints; fault hypothesis; fault tolerance specification; health management function; integrity; large complex computing systems; maintainability; partially correct system; performance; quality of service requirements; reliability; requirements specification phase; safety; system behaviors; system fault hypothesis; system life-cycle; timeliness; Availability; Constraint theory; Control systems; Fault tolerant systems; Fires; Guidelines; Health and safety; Quality management; Quality of service; Reliability theory;
Conference_Titel :
Engineering of Complex Computer Systems, 1995. Held jointly with 5th CSESAW, 3rd IEEE RTAW and 20th IFAC/IFIP WRTP, Proceedings., First IEEE International Conference on
Conference_Location :
Ft. Lauderdale, FL
Print_ISBN :
0-8186-7123-8
DOI :
10.1109/ICECCS.1995.479360