Runtime Optimization of Broadcast Communications Using Dynamic Network Topology Information from MPI

Author

Godwin, Jeffrey ; Karlsson, Christer ; Chen, Zizhong

Author_Institution

Colorado Sch. of Mines, Golden, CO, USA

fYear

2012

fDate

25-27 June 2012

Firstpage

287

Lastpage

294

Abstract

Modern commodity compute clusters are often composed of many multi-core nodes, that are connected via a network to each other. On multi-core clusters, inter-node network communications are typically an order of magnitude slower than those between processes on the same node, which effectively creates a heterogeneous, tiered network topology. Presently, most MPI implementations assume a homogeneous network composition, which causes them to have less than optimal performance on multi-core clusters. In this paper, we treat a multi-core cluster as a heterogeneous cluster and optimize the performance of MPI broadcast communications by scheduling messages according to topology information. We experimentally demonstrate that previous heuristics for heterogeneous clusters such as Fastest Edge First (FEF) do not produce optimal results on multi-core clusters for broadcast communications. Our solution is to modify the Fastest Edge First heuristic by imposing an additional constraint, that permits only one core per node to participate in inter-node communications, creating a nested binomial tree structure. Using this constraint we are able to achieve performance gains of 20%-60% over the MPI broadcast implementation on homogeneous, multi-core clusters.

Keywords

broadcast communication; computer communications software; message passing; multiprocessing systems; network topology; optimisation; processor scheduling; tree data structures; FEF heuristic; MPI broadcast communications; dynamic network topology information; fastest edge first heuristic; heterogeneous cluster; heterogeneous network topology; homogeneous network composition; inter-node network communications; message scheduling; modern commodity compute clusters; multicore clusters; multicore nodes; nested binomial tree structure; performance optimization; runtime optimization; tiered network topology; Benchmark testing; Clustering algorithms; Multicore processing; Network topology; Schedules; Timing; Topology; Broadcast; Cluster; Fastest Edges First; Message Passing Interface (MPI); Multicore;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on

Conference_Location

Liverpool

Print_ISBN

978-1-4673-2164-8

Type

conf

DOI

10.1109/HPCC.2012.46

Filename

6332186