Title :
Impact of reconfigurable hardware on accelerating MPI_Reduce
Author :
Gao, Shanyuan ; Schmidt, Andrew G. ; Sass, Ron
Author_Institution :
Reconfigurable Comput. Syst. Lab., Univ. of North Carolina at Charlotte, Charlotte, NC, USA
Abstract :
This paper demonstrates the benefits and pit-falls of implementing the collective communication operation reduce in the reconfigurable resources of an FPGA device across a cluster of all-FPGA compute nodes. Specifically, the communication and computation semantics of the MPI_Reduce call from the de facto Message-Passing Interface have been implemented. Using a synthetic benchmark a cluster of 32 FPGA nodes with a 300 MHz PowerPC processor, custom high speed network, and reduce core is compared against a conventional commodity cluster with 3.2 GHz Xeon processors and Gigabit Ethernet. The design is customized to support performing many reduce operations on small datasets while minimizing the amount of on-chip resources used, which is an increasingly common demand from domain scientists. Speedups of ≈2x to ≈800x are reported over that of a commodity cluster for small datasets, which provides significant motivation to continue the investigation into supporting additional collective communication operations directly in hardware.
Keywords :
field programmable gate arrays; message passing; reconfigurable architectures; .2 GHz Xeon processors; 300 MHz PowerPC processor; FPGA device; Gigabit Ethernet; MPI_Reduce; message-passing interface; reconfigurable hardware; Ethernet networks; Field programmable gate arrays; Hardware; Optimization; Peer to peer computing; Software; Topology;
Conference_Titel :
Field-Programmable Technology (FPT), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8980-0
DOI :
10.1109/FPT.2010.5681537