DocumentCode :
3215297
Title :
Scaling all-to-all multicast on fat-tree networks
Author :
Kumar, Sameer ; Kalè, Laxmikant V.
Author_Institution :
Dept. of Comput. Sci., Illinois Univ., Urbana-Champaign, IL, USA
fYear :
2004
fDate :
7-9 July 2004
Firstpage :
205
Lastpage :
214
Abstract :
In this paper, we study the all-to-all multicast operation. Strategies for all-to-all multicast need to be different for small and large messages. For small messages, the major issue is the minimization of software overhead, where as for large messages, the issue is network contention. Many modern large parallel computers use the fat-tree interconnection topology. We therefore analyze network contention on fat-tree networks and develop strategies to optimize collective multicast using known contention free communication schedules on fat-tree networks in the design of two strategies. We evaluate performance of these strategies with up to 256 nodes (1024 processors) on an alpha cluster. We present schemes that perform well when a contiguous chunk of nodes is not available. For large messages, many of our strategies have two times better throughput than native MPI. We also demonstrate that the software overhead of a collective operation is a small fraction of the total completion time in the presence of the communication coprocessor. We therefore compare the performance of the studied strategies using both metrics (i) completion time, and (it) computation overhead.
Keywords :
computer networks; message passing; multicast communication; performance evaluation; telecommunication network topology; trees (mathematics); workstation clusters; MPI; all-to-all multicast; alpha cluster; collective multicast; communication coprocessor; contention free communication scheduling; fat-tree interconnection topology; fat-tree networks; network contention; parallel computers; software overhead; Bandwidth; Computer science; Concurrent computing; Coprocessors; Costs; Design optimization; Network topology; Processor scheduling; Quantum computing; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on
ISSN :
1521-9097
Print_ISBN :
0-7695-2152-5
Type :
conf
DOI :
10.1109/ICPADS.2004.1316097
Filename :
1316097
Link To Document :
بازگشت