DocumentCode
395586
Title
Software fault tolerance of distributed programs using computation slicing
Author
Mittal, Neeraj ; Garg, Vijay K.
Author_Institution
Dept. of Comput. Sci., Texas Univ. at Dallas, Richardson, TX, USA
fYear
2003
fDate
19-22 May 2003
Firstpage
105
Lastpage
113
Abstract
Writing correct distributed programs is hard. In spite of extensive testing and debugging, software faults persist even in commercial grade software. Many distributed systems, especially those employed in safety-critical environments, should be able to operate properly even in the presence of software faults. Monitoring the execution of a distributed system, and, on detecting a fault, initiating the appropriate corrective action is an important way to tolerate such faults. This gives rise to the predicate detection problem which involves finding a consistent cut of a distributed computation, if it exists, that satisfies the given global predicate. Detecting a predicate in a computation is, however, an NP-complete problem. To ameliorate the associated combinatorial explosion problem, we introduce the notion of computation slice in our earlier papers [5, 10]. Intuitively, slice is a concise representation of those consistent cuts that satisfy a certain condition. To detect a predicate, rather than searching the state-space of the computation, it is much more efficient to search the state-space of the slice. In this paper we provide efficient algorithms to compute the slice for several classes of predicates. Our experimental results demonstrate that slicing can lead to an exponential improvement over existing techniques in terms of lime and space.
Keywords
computational complexity; distributed algorithms; program debugging; program slicing; software fault tolerance; computation slicing; distributed program; partial-order method; predicate detection; search-space pruning; software debugging; software fault tolerance; software testing; software-fault tolerance; Distributed computing; Explosions; Fault detection; Fault tolerance; Monitoring; NP-complete problem; Software debugging; Software safety; Software testing; Writing;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on
ISSN
1063-6927
Print_ISBN
0-7695-1920-2
Type
conf
DOI
10.1109/ICDCS.2003.1203457
Filename
1203457
Link To Document