Title :
Evaluating MPI collective communication on the SP2, T3D, and Paragon multicomputers
Author :
Hwang, Kai ; Wang, Choming ; Wang, Cho-Li
Author_Institution :
Hong Kong Univ., Hong Kong
Abstract :
We evaluate the architectural support of collective communication operations on the IBM SP2, Cray T3D, and Intel Paragon. The MPI performance data are obtained from the STAP benchmark experiments jointly performed at the USC and HKU. The T3D demonstrated clearly the best timing performance in almost all collective operations. This is attributed to the special hardware built in the T3D for fast messaging and block data transfer. With hardwired barriers, the T3D performs the barrier synchronization in 3 μs at least 30 times faster than the SP2 or Paragon. The startup latency of collective operations increases either linearly or logarithmically in three multicomputers. For short messages, the SP2 outperforms the Paragon in the barrier, total exchange, scatter, and gather operations. Various collective operations with 64 KBytes per message over 64 nodes of the three machines can be completed in the time range (5.12 ms, 675 ms). The Paragon outperforms the SP2 in almost all collective operations with long messages. We have derived closed-form expressions to quantify the collective messaging times and aggregated bandwidth on all three machines. For total exchange with 64 nodes, the T3D, Paragon, and SP2 achieved an aggregated bandwidth of 1.745, 0.879, and 0.818 GBytes/s, respectively. These findings are useful to those who wish to predict the MPP performance or to optimize parallel applications by trade-offs between divided computation and collective communication
Keywords :
message passing; multiprocessing systems; performance evaluation; synchronisation; timing; Cray T3D; IBM SP2; Intel Paragon; MPI collective communication evaluation; Paragon multicomputers; STAP benchmark experiments; architectural support; closed-form expressions; startup latency; timing performance; Bandwidth; Closed-form solution; Concurrent computing; Delay effects; Hardware; Message passing; Scattering; Size measurement; Timing; Workstations;
Conference_Titel :
High-Performance Computer Architecture, 1997., Third International Symposium on
Conference_Location :
San Antonio, TX
Print_ISBN :
0-8186-7764-3
DOI :
10.1109/HPCA.1997.569646