Title :
MPIActor - A Multicore-Architecture Adaptive and Thread-Based MPI Program Accelerator
Author :
Liu, Zhiqiang ; Ren, Kaijun ; Song, Junqiang
Author_Institution :
Coll. of Comput., Nat. Univ. of Defense Technol. Changsha, Changsha, China
Abstract :
Improving MPI foundational software to suit multicore systems is a key issue for developing effective parallel software on high performance communication domain. Towards this issue, in this paper, we propose a novel technique, called MPI Accelerator or MPIActor in short, which is a transparent middleware to enhance conventional MPI libraries. The main idea is to optimize MPI routines for multicore systems by adopting threaded MPI mechanism and multicore architecture aware collectives in MPIActor. With the join of MPIActor, on one hand, all MPI processes in each node are mapped to several threads in one process. As a result, the overhead of intra-node point-to-point communications can greatly decrease. On the other hand, the collective routines are implemented by the cooperation of individual intra - and inter-node collective subroutines, and the intra-node collective subroutines can be further optimized by multicore architecture aware collective algorithms. Based on above idea, a framework involving an MPI_Reduce routine and a set of point-to-point communication routines has been implemented and evaluated on a 256 cores Nehalem platform. When compared to the performance of MVAPICH2, the final experimental results show that the performance by MPIActor can be significantly improved whatever by using OSU_LATENCY benchmark for point-to-point communications or IMB Reduce benchmark for reduction collectives. Especially, the performance results of using OSU_LATENCY benchmark even can be improved up to 321%.
Keywords :
benchmark testing; message passing; middleware; multi-threading; multiprocessing systems; parallel architectures; software libraries; subroutines; IMB reduce benchmark; MPI foundational software; MPI library; MPI reduce routine; MPIActor; MVA PICH2; OSU LATENCY benchmark; high performance communication domain; internode collective subroutine; intranode collective subroutine; intranode point-to-point communication; multicore architecture adaptive; parallel software; thread based MPI program accelerator; transparent middleware; Communication; MPI Accelerator; MPIActor; Multicore-Architecture Adaptive Collective; Threaded MPI;
Conference_Titel :
High Performance Computing and Communications (HPCC), 2010 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4244-8335-8
Electronic_ISBN :
978-0-7695-4214-0
DOI :
10.1109/HPCC.2010.89