Title :
PCODE: an efficient and reliable collective communication protocol for unreliable broadcast domain
Author :
Bruck, Jehoshua ; Dolev, Danny ; Ho, Ching-Tien ; Orni, Rimon ; Strong, Ray
Author_Institution :
California Inst. of Technol., Pasadena, CA, USA
Abstract :
Existing programming environments for clusters are typically built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. For example, a broadcast that is implemented using a TCP/IP protocol (which is a point-to-point protocol) over a LAN is obviously an efficient as it is not utilizing the fact that the LAN is a broadcast medium. We have observed that the main difference between a distributed computing paradigm and a message passing parallel computing paradigm is that, in a distributed environment the activity of every processor is independent while in a parallel environment the collection of the user-communication layers in the processors can be modeled as a single global program. We have formalized the requirements by defining the notion of a correct global program. This notion provides a precise specification, of the interface between the transport layer and the user-communication. Layer. We have developed PCODE, a new communication protocol that is driven by a global program, and proved its correctness. We have implemented the PCODE protocol on a collection of IBM RS/6000 workstations and on a collection of Silicon Graphics Indigo workstations, both communicating via UDP broadcast. The experimental results we obtained indicate that the performance advantage of PCODE over the current point-to-point approach (TCP) can be as high as an order of magnitude on a cluster of 16 workstations
Keywords :
local area networks; message passing; transport protocols; LAN; PCODE; Silicon Graphics Indigo workstations; broadcast; communication protocol; point-to-point protocol; programming environments; unreliable broadcast domain; Broadcasting; Distributed computing; Local area networks; Message passing; Parallel processing; Programming environments; Protocols; TCPIP; Telecommunication network reliability; Workstations;
Conference_Titel :
Parallel Processing Symposium, 1995. Proceedings., 9th International
Conference_Location :
Santa Barbara, CA
Print_ISBN :
0-8186-7074-6
DOI :
10.1109/IPPS.1995.395924