Title :
Deterministic Replay for Transparent Recovery in Component-Oriented Middleware
Author :
Strom, Rob ; Dorai, Chitra ; Feng, Thomas Huining ; Zheng, Wei
Author_Institution :
IBM T.J. Watson Res. Center, Hawthorne, NY, USA
Abstract :
We present and evaluate a low-overhead approach for achieving high-availability in distributed event-processing middleware systems consisting of networks of stateful software components that communicate by either one-way (send) or two-way (call) messages. The approach is based on transparently augmenting each component to produce a deterministic component whose state can be recovered by checkpoint and replay. Determinism is achieved by augmenting messages with virtual times, and by scheduling message handling in virtual time order. Scheduling delays are reduced by computing virtual times with estimators: deterministic functions that approximate the expected real times of arrival. We describe our algorithms, show how Java components can be transparently augmented with checkpointing code and with good estimators, discuss how our deterministic runtime can be tuned to reduce overhead, and provide experimental results to measure the overhead of determinism relative to non-determinism.
Keywords :
Java; checkpointing; distributed algorithms; message passing; middleware; object-oriented programming; scheduling; Java component; checkpointing code; component oriented middleware; distributed event processing middleware system; scheduling message handling; software component; transparent recovery; Checkpointing; Delay estimation; Discrete event simulation; Distributed computing; Java; Logic; Middleware; Processor scheduling; Runtime; USA Councils; component-oriented middleware; determinism; high availability; recovery; replay; virtual time;
Conference_Titel :
Distributed Computing Systems, 2009. ICDCS '09. 29th IEEE International Conference on
Conference_Location :
Montreal, QC
Print_ISBN :
978-0-7695-3659-0
Electronic_ISBN :
1063-6927
DOI :
10.1109/ICDCS.2009.79