Title :
Application-transparent fault tolerance in distributed systems
Author_Institution :
Dept. of Comput. Sci., Kaiserslautern Univ., Germany
Abstract :
We present a new software architecture in which all concepts necessary to achieve fault tolerance can be added to an application automatically without any source code changes. As a case study, we consider the problem of providing a reliable service despite node failures by executing a group of replicated servers. Replica creation and management as well as failure detection and recovery are performed automatically by a separate fault tolerance layer (ft-layer) which is inserted between the server application and the operating system kernel. The layer is invisible for the application since it provides the same functional interface as the operating system kernel, thus making the fault tolerance property of the service completely transparent for the application. A major advantage of our architecture is that the layer encapsulates both fault tolerance mechanisms and policies. This allows for maximum flexibility in the choice of appropriate methods for fault tolerance without any changes in the application code
Keywords :
distributed processing; fault tolerant computing; operating systems (computers); software engineering; software reliability; application code; application-transparent fault tolerance; distributed systems; failure detection; failure recovery; fault tolerance layer; functional interface; node failures; operating system kernel; reliable service; replicated servers; server application; software architecture; source code change; Computer science; Fault tolerance; Fault tolerant systems; Joining processes; Kernel; Libraries; Operating systems; Programming profession; Software architecture; Testing;
Conference_Titel :
Configurable Distributed Systems, 1994., Proceedings of 2nd International Workshop on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
0-8186-5390-6
DOI :
10.1109/IWCDS.1994.289937