DocumentCode :
2737347
Title :
Timely failure detection in a large distributed real-time system
Author :
Ng, Tony P. ; Patel, Vikram N.
Author_Institution :
Loral Federal Syst., Air Traffic Control
fYear :
1994
fDate :
24-25 Oct 1994
Firstpage :
118
Lastpage :
123
Abstract :
The paper describes the experience of designing and implementing failure detection and reporting in a large distributed real time system used for air traffic control (ATC). We believe that systematic analysis is needed to guide the failure detection design and track the large number of failures that it deals with. Analysis such as how fast failures have to be detected should be performed carefully to avoid redesigns later. A comprehensive analysis also provides a basis for testing the design subsequently, during which fault injection and extended testing are needed to evaluate and debug the design. Failure detectors should detect specific failures so that appropriate reports and recovery actions can be initiated after detection
Keywords :
Air traffic control; Availability; Computer crashes; Databases; Detectors; Failure analysis; Hardware; Operating systems; Real time systems; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Object-Oriented Real-Time Dependable Systems, 1994. Proceedings of WORDS 94., First Workshop on
Conference_Location :
Dana Point, CA
Print_ISBN :
0-8186-7083-5
Type :
conf
DOI :
10.1109/WORDS.1994.518680
Filename :
518680
Link To Document :
بازگشت