مرکز منطقه ای اطلاع رساني علوم و فناوري - Hardware implementation of MPI

DocumentCode :

3535479

Title :

Hardware implementation of MPI_Barrier on an FPGA cluster

Author :

Gao, Shanyuan ; Schmidt, Andrew G. ; Sass, Ron

Author_Institution :

Electr. & Comput. Eng. Dept., Univ. of North Carolina at Charlotte, Charlotte, NC, USA

fYear :

2009

fDate :

Aug. 31 2009-Sept. 2 2009

Firstpage :

Lastpage :

Abstract :

Message-Passing is the dominant programming model for distributed memory parallel computers and Message-Passing Interface (MPI) is the standard. Along with point-to-point send and receive message primitives, MPI includes a set of collective communication operations that are used to synchronize and coordinate groups of tasks. The MPI_Barrier, one of the most important collective procedures, has been extensively studied on a variety of architectures over last twenty years. However, a cluster of Platform FPGAs is a new architecture and offers interesting, resource-efficient options for implementing the barrier operation. This paper describes an FPGA implementation of MPI_Barrier. The premise is that barrier (and other collective communication operations) are very sensitive to latency as the number of nodes scales to the tens-of-thousands. The relatively slow processors found on FPGAs will significantly cap performance. The FPGA hardware design implements a tree-based algorithm and is tightly integrated with the custom high-speed on-chip/off-chip network. MPI access is available through a specially-designed kernel module. This effectively offloads the work from the CPU and OS into hardware. The evaluation of this design shows significant performance gains compared with a conventional software implementation on both an FPGA cluster and a commodity cluster. Further, it suggests that moving other MPI collective operations into hardware would be beneficial.

Keywords :

application program interfaces; field programmable gate arrays; message passing; microprocessor chips; FPGA cluster; MPI_barrier operation; collective communication operation; distributed memory parallel computer; kernel module; message passing interface; on-chip/off-chip network; tree-based algorithm; Algorithm design and analysis; Computer interfaces; Concurrent computing; Delay; Distributed computing; Field programmable gate arrays; Hardware; Kernel; Network-on-a-chip; Parallel programming;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on

Conference_Location :

Prague

ISSN :

1946-1488

Print_ISBN :

978-1-4244-3892-1

Electronic_ISBN :

1946-1488

Type :

conf

DOI :

10.1109/FPL.2009.5272560

Filename :

5272560

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3535479