Title :
Fault-tolerant DSM on the SOME-Bus multiprocessor architecture with message combining
Author :
Katsinis, Constantine ; Hecht, Diana
Author_Institution :
Electr. & Comput. Eng., Drexel Univ., Philadelphia, PA, USA
Abstract :
Summary form only given. We present a broadcast-based architecture called the SOME-Bus interconnection network, which directly links processor nodes without contention, and can efficiently interconnect several hundred nodes. Each node has a dedicated output channel and an array of receivers, with one receiver dedicated to every other node´s output channel. The SOME-Bus eliminates the need for global arbitration and provides bandwidth that scales directly with the number of nodes in the system. Under the distributed shared memory (DSM) paradigm, the SOME-bus allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence mechanism. Backward error recovery fault-tolerance techniques can exploit DSM data replication and SOME-Bus broadcasts with little additional network traffic and corresponding performance degradation. Simulation results show that in the SOME-Bus architecture under the DSM paradigm, messages tend to wait at the node output network interface. Consequently, to minimize the effect of increased network traffic, messages can be combined at the node output queue to form a new message containing the payloads of all original messages. We use simulation to examine the effect of such message combining on the performance of SOME-Bus, in the presence of additional traffic due to fault tolerance, and we compare it to similar performance measures of a reduced SOME-Bus network where two nodes share one channel.
Keywords :
bandwidth allocation; distributed shared memory systems; fault tolerant computing; message passing; multiprocessor interconnection networks; network interfaces; parallel architectures; system recovery; telecommunication traffic; SOME-Bus multiprocessor architecture; backward error recovery; broadcast-based architecture; cache coherence mechanism; cache controller hardware; distributed shared memory system; fault-tolerance techniques; fault-tolerant DSM; message combining; network interface; network traffic; Bandwidth; Broadcasting; Communication system traffic control; Control systems; Fault tolerance; Hardware; Multiprocessor interconnection networks; Telecommunication traffic; Traffic control; Transmitters;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International
Print_ISBN :
0-7695-2132-0
DOI :
10.1109/IPDPS.2004.1303240