DocumentCode :
3172847
Title :
How fail-stop are faulty programs?
Author :
Chandra, S. ; Chen, P.M.
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., MI, USA
fYear :
1998
fDate :
23-25 June 1998
Firstpage :
240
Lastpage :
249
Abstract :
Most fault-tolerant systems are designed to stop faulty programs before they write permanent data or communicate with other processes. This property (halt-on-failure) forms the core of the fail-stop model. Unfortunately, little experimental data exists on whether or not program failures follow the fail-stop model. This paper describes a tool, based on the SimOS complete-machine simulator that can trace how faults propagate through memory, disk, and functions. Using this tool on the Postgres database system, we conduct a controlled experiment to measure how often faulty programs violate the fail-stop model. We find that a significant number of faults (7%) violate the fail-stop model by writing incorrect data to stable storage before halting. We then apply Postgres´ transaction mechanism to undo recent changes before a crash and find that transactions reduce fail-stop violations by a factor of 3.
Keywords :
relational databases; software fault tolerance; system recovery; transaction processing; virtual machines; Postgres database; SimOS; complete-machine simulator; experiment; fail-stop model; fault-tolerant systems; faulty programs; halt-on-failure; transaction processing; Application software; Computer bugs; Computer science; Condition monitoring; Fault detection; Kernel; Software systems; System software; Transaction databases; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fault-Tolerant Computing, 1998. Digest of Papers. Twenty-Eighth Annual International Symposium on
Conference_Location :
Munich, Germany
ISSN :
0731-3071
Print_ISBN :
0-8186-8470-4
Type :
conf
DOI :
10.1109/FTCS.1998.689475
Filename :
689475
Link To Document :
بازگشت