Title :
State checksum and its role in system stabilization
Author :
Huang, Chin-Tser ; Gouda, Mohamed G.
Author_Institution :
Dept. of Comput. Sci.amp; Eng., South Carolina Univ., Columbia, SC, USA
Abstract :
Although a self-stabilizing system that suffers from a transient fault is guaranteed to converge to a legitimate state after a finite number of steps, the convergence can be slow if the harmful effects of the fault are allowed to propagate into many processes in the system. Moreover, some safety properties of the system may be violated during the convergence. To address these problems, we propose in this paper the concept of a state checksum - a redundancy that can be added to the state of a self-stabilizing system so that some classes of faults become visible to the system, and the system can limit the propagation of their harmful effects, and maintain its safety properties during the convergence. To make these concepts concrete, we discuss the case study of a token ring and show how to use fault-detecting and fault-correcting checksums to detect visible faults, limit the propagation of their harmful effects, and ensure that the safety properties of the ring are maintained during the convergence from these faults.
Keywords :
fault diagnosis; fault tolerant computing; redundancy; safety systems; system recovery; fault convergence; fault detection; fault-correcting checksum; fault-detecting checksum; self-stabilizing system; system safety; system stabilization; token ring; Computer science; Concrete; Conferences; Convergence; Distributed computing; Fault detection; Interference; Redundancy; Safety; Token networks;
Conference_Titel :
Distributed Computing Systems Workshops, 2005. 25th IEEE International Conference on
Print_ISBN :
0-7695-2328-5
DOI :
10.1109/ICDCSW.2005.128