Title :
TFT: a software system for application-transparent fault tolerance
Author_Institution :
Stratus Comput. Inc., Marlborough, MA, USA
Abstract :
An important objective of software fault tolerant systems should be to provide a fault-tolerance infrastructure in a manner that minimizes the effort required by the application developer. In the limit, the objective is to provide fault tolerance transparently to the application. TFT, the work presented in this paper, provides transparent fault-tolerance at a higher interface than prior solutions. TFT coordinates replicas at the system call interface, interposing a supervisor agent between the application and the operating system. Moving the replica coordination to this interface allows uncorrelated faults within the operating system and below to be tolerated and also admits the possibility of online operating system and hardware upgrades. To accomplish its task, TFT must enforce a deterministic computation above the system call interface. The potential sources of non-determinism addressed include non-deterministic system calls, delivery of asynchronous events, and the representation of operating system abstractions that differ between replicas.
Keywords :
operating system kernels; software fault tolerance; TFT; asynchronous events; deterministic computation; fault-tolerance infrastructure; nondeterministic system calls; operating syste; operating system abstractions; software fault tolerant systems; supervisor agent; system call interface; transparent fault tolerance; uncorrelated faults; Application software; Costs; Fault tolerance; Fault tolerant systems; Hardware; Operating systems; Programming profession; Software systems; Thin film transistors; Virtual machine monitors;
Conference_Titel :
Fault-Tolerant Computing, 1998. Digest of Papers. Twenty-Eighth Annual International Symposium on
Conference_Location :
Munich, Germany
Print_ISBN :
0-8186-8470-4
DOI :
10.1109/FTCS.1998.689462