Title :
Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems
Author :
Slember, Joseph G. ; Narasimhan, Priya
Author_Institution :
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.
Keywords :
distributed object management; fault tolerant computing; object-oriented programming; program diagnostics; consistent behavior; distributed CORBA applications; distributed system; fault-tolerant application; fault-tolerant replicated applications; fault-tolerant replicated systems; nondeterministic features; program analysis; Application software; Computer crashes; Distributed computing; Engineering profession; Fault diagnosis; Fault tolerance; Fault tolerant systems; Operating systems; Programming profession; Runtime;
Conference_Titel :
Reliable Distributed Systems, 2004. Proceedings of the 23rd IEEE International Symposium on
Print_ISBN :
0-7695-2239-4
DOI :
10.1109/RELDIS.2004.1353026