Title :
RMPP: the reliable message passing protocol
Author :
Riesen, Rolf ; Maccabe, Arthur B.
Author_Institution :
Sandia Nat. Labs., Albuquerque, NM, USA
Abstract :
Large-scale clusters built out of commercial components face similar scalability obstacles as the massively parallel processors (MPP) of the 1980s. This is especially true when they are used for scientific computing. Their networks are the descendants of the MPP networks, but the communication software in use has been designed for wide-area networks with client/server applications in mind. We present a communication protocol which has been designed specifically for large-scale clusters with a scientific application workload. The protocol takes advantage of the low error rate and high performance of these networks. It is adapted to the peculiarities of these MPP-like networks and the communication characteristics of scientific applications. This paper only presents the protocol itself and the ideas behind it. We refer the reader to other publications for more information about scalability, performance, and usage of the protocol presented here.
Keywords :
computer network reliability; message passing; natural sciences computing; protocols; workstation clusters; MPP-like networks; RMPP; communication protocol; error rate; large-scale clusters; performance; reliable message passing protocol; scientific computing; Application software; Computer network reliability; Hardware; Laboratories; Large-scale systems; Message passing; Network servers; Protocols; Scalability; Switches;
Conference_Titel :
Local Computer Networks, 2002. Proceedings. LCN 2002. 27th Annual IEEE Conference on
Print_ISBN :
0-7695-1591-6
DOI :
10.1109/LCN.2002.1181843