DocumentCode :
1081632
Title :
Unified approach to synchronous and asynchronous approximate agreement in the presence of hybrid faults
Author :
Kieckhafer, R.M. ; Azadmanesh, M.H.
Author_Institution :
Dept. of Comput. Sci. & Eng., Nebraska Univ., Lincoln, NE, USA
Volume :
44
Issue :
4
fYear :
1995
fDate :
12/1/1995 12:00:00 AM
Firstpage :
622
Lastpage :
631
Abstract :
An important problem in fault-tolerant distributed computer systems is maintaining agreement between nonfaulty processes in the presence of undiagnosed faults. Approximate agreement defines a condition in which it is not necessary for the agreed values to be numerically identical. Rather, processes need only agree with each other to within a predefined numerical tolerance. Convergent voting algorithms which achieve approximate agreement have been studied in the context of two classes of systems, synchronous and asynchronous. Studies have also addressed both completely connected and partially connected systems. Together, the two properties of synchrony and connectivity yield 4 different voting domains. In all studies to date, each voting domain has been treated as a separate problem. This paper: shows that for at least one broad family of voting algorithms, the 4 domains are special cases of a more general convergent voting problem; analyzes convergent voting under the 3-mode hybrid fault model of Thambidurai and Park; and presents a set of unifying relations applicable to all 4 voting domains. These relations are used to specify voting algorithms which optimize fault-tolerance, convergence rate, or computational overhead in any given voting domain. The task of designing a voting algorithm for a particular fault-tolerant system is thus greatly simplified
Keywords :
convergence of numerical methods; distributed processing; fault tolerant computing; multiprocessing systems; reliability; 3-mode hybrid fault model; Byzantine agreement; asynchronous approximate agreement; clock synchronisation; completely connected systems; convergent voting algorithms; fault tolerant multiprocessor; fault-tolerant distributed computer systems; hybrid faults; nonfaulty processes; partially connected systems; synchronous approximate agreement; undiagnosed faults; Algorithm design and analysis; Application software; Clocks; Convergence; Distributed computing; Fault tolerance; Fault tolerant systems; Partitioning algorithms; Synchronization; Voting;
fLanguage :
English
Journal_Title :
Reliability, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9529
Type :
jour
DOI :
10.1109/24.475992
Filename :
475992
Link To Document :
بازگشت