DocumentCode :
3414667
Title :
A reliable hardware barrier synchronization scheme
Author :
Sivaram, Rajeev ; Stunkel, Craig B. ; Panda, Dhabaleswar K.
Author_Institution :
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
fYear :
1997
fDate :
1-5 Apr 1997
Firstpage :
274
Lastpage :
280
Abstract :
Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier synchronization through software, hardware, or a combination of these mechanisms. However few of these schemes emphasize fault-tolerant barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardware-based barrier synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated fault-tolerant message-passing protocols are presented. The protocols are optimized for the no-fault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme shows significant potential for use in parallel systems, especially the emerging systems based on networks of workstations
Keywords :
fault tolerant computing; message passing; parallel architectures; protocols; synchronisation; barrier synchronization; fault-tolerant; message-passing protocols; parallel systems; switch architecture; Application software; Delay; Fault tolerance; Fault tolerant systems; Hardware; Information science; Multiprocessor interconnection networks; Protocols; Switches; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Symposium, 1997. Proceedings., 11th International
Conference_Location :
Genva
ISSN :
1063-7133
Print_ISBN :
0-8186-7793-7
Type :
conf
DOI :
10.1109/IPPS.1997.580908
Filename :
580908
Link To Document :
بازگشت