DocumentCode :
2130439
Title :
Fault-tolerant clock synchronization of large multicomputers via multistep interactive convergence
Author :
De Azevedo, Marcelo Moraes ; Blough, Douglas M.
Author_Institution :
Dept. of Electr. & Comput. Eng., California Univ., Irvine, CA, USA
fYear :
1996
fDate :
27-30 May 1996
Firstpage :
249
Lastpage :
258
Abstract :
We present a fault-tolerant algorithm that internally synchronizes clocks in multicomputer systems employing not completely connected networks (NCCNs). The algorithm is referred to as multistep interactive convergence, and is locally implemented in each node by a time sewer process (TSP). The algorithm proceeds in rounds, and bases its operation on a logical mapping of the system´s TSPs into an m-dimensional array. A TSP executes m steps per round, each step including a call to an interactive convergence procedure. Clock readings in step i are gathered only from TSPs sharing a row along dimension i of the array, which reduces the number of messages by orders of magnitude over a conventional interactive convergence algorithm. The algorithm can be used in systems of arbitrary topology, and provides the added benefit of increased locality of communication in regular NCCNs. These advantages can be combined with a variety of message staggering mechanisms to maintain network contention at a minimum. We characterize the maximum clock skew maximum clock drift, maximum clock discontinuity, and number of messages produced by the algorithm, and show that it tolerates arbitrary faults. A comparison with other algorithms is provided
Keywords :
fault tolerant computing; multiprocessor interconnection networks; protocols; synchronisation; fault-tolerant clock synchronization; interactive convergence algorithm; interactive convergence procedure; large multicomputers; logical mapping; m-dimensional array; maximum clock discontinuity; maximum clock skew maximum clock drift; message staggering mechanisms; multistep interactive convergence; network contention; not completely connected networks; time sewer process; Clocks; Computer networks; Convergence; Costs; Fault tolerance; Fault tolerant systems; Logic arrays; Network servers; Protocols; Synchronization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems, 1996., Proceedings of the 16th International Conference on
Print_ISBN :
0-8186-7399-0
Type :
conf
DOI :
10.1109/ICDCS.1996.507923
Filename :
507923
Link To Document :
بازگشت