DocumentCode :
3437347
Title :
A Transparent Control-Flow Based Approach to Record-Replay Non-deterministic Bugs
Author :
Wang, Nan ; Han, Jizhong ; Fang, Jinyun
Author_Institution :
Inst. of Comput. Technol., Beijing, China
fYear :
2012
fDate :
28-30 June 2012
Firstpage :
189
Lastpage :
198
Abstract :
Record-replay is effective to reproduce non-deterministic bugs, and has gained attentions in research community. However, current approaches fall short of handling nondeterministic bugs in multi-processor platforms and distributed systems due to several reasons. First, multi-thread programs on multi-processor platforms, which are common in today´s distributed systems, are difficult to be recorded and replayed because of data-races. Second, increasing systems scale makes production environment more sensitive to perturbation from recording. Even hacking control scripts has been unacceptable because of the boosting complexity comes from variety of programs and large number of computing cores. Third, when deployed in distributed systems, large scale will also multiply recording traces, which overwhelms developers, and also slows down the whole system dramatically. To address the above issues, we propose following mechanisms to efficiently record-reply in multi-processor distributed systems: control-flow based record-replay, low-perturbation loading and proportion sampling. We have implemented these mechanisms in ReBranch -- a practical record-replay system for debugging multi-thread programs in multi-processor platforms and distributed systems. ReBranch has already shown its power on dealing with real bugs. We also present our debugging experiences using ReBranch with a case study on handling a bug in memcached -- an important component in many commercial systems.
Keywords :
multi-threading; multiprocessing systems; program debugging; ReBranch; boosting complexity; control-flow based record-replay; hacking control scripts; low-perturbation loading; memcached; multiprocessor distributed systems; multiprocessor platforms; multithread programs; non-deterministic bugs; practical record-replay system; proportion sampling; transparent control-flow based approach; Computer bugs; Control systems; Debugging; Hardware; Instruction sets; Loading; Process control; debugging; distributed systems; parallel programming; record-replay;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on
Conference_Location :
Xiamen, Fujian
Print_ISBN :
978-1-4673-1889-1
Type :
conf
DOI :
10.1109/NAS.2012.28
Filename :
6310893
Link To Document :
بازگشت