Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters

Author

Chai, Lei ; Hartono, Albert ; Panda, Dhabaleswar K.

Author_Institution

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH

fYear

2006

fDate

25-28 Sept. 2006

Firstpage

1

Lastpage

10

Abstract

As new processor and memory architectures advance, clusters start to be built from larger SMP systems, which makes MPI intra-node communication a critical issue in high performance computing. This paper presents a new design for MPI intra-node communication that aims to achieve both high performance and good scalability in a cluster environment. The design distinguishes small and large messages and handles them differently to minimize the data transfer overhead for small messages and the memory space consumed by large messages. Moreover, the design utilizes the cache efficiently and requires no locking mechanisms to achieve optimal performance even with large system size. This paper also explores various optimization strategies to reduce polling overhead and maintain data locality. We have evaluated our design on NUMA and dual core NUMA (non-uniform memory access) systems. The experimental results on NUMA system show that the new design can improve MPI intra-node latency by up to 35% and bandwidth by up to 50% compared to MVAPICH. While running the bandwidth benchmark, the measured L2 cache miss rate is reduced by half. The new design also improves the performance of MPI collective calls by up to 25%. The results on dual core NUMA system show that the new design can achieve 0.48 musec in CMP latency

Keywords

message passing; shared memory systems; workstation clusters; L2 cache miss rate; bandwidth benchmark; cluster computing; dual core NUMA system; high performance MPI intra-node communication support; larger SMP systems; memory architectures; multicore processor; nonuniform memory access systems; scalable MPI intra-node communication support; workstation clusters; Bandwidth; Computer architecture; Computer science; Delay; Design engineering; High performance computing; Memory architecture; Multicore processing; Scalability; Sun; Cluster Computing; Intra-node Communication; MPI; Multi-core Processor; Non-Uniform Memory Access (NUMA);

fLanguage

English

Publisher

ieee

Conference_Titel

Cluster Computing, 2006 IEEE International Conference on

Conference_Location

Barcelona

ISSN

1552-5244

Print_ISBN

1-4244-0327-8

Electronic_ISBN

1552-5244

Type

conf

DOI

10.1109/CLUSTR.2006.311850

Filename

4100356