DocumentCode
3459504
Title
Implementation of Parallel 1-D FFT on GPU Clusters
Author
Takahashi, Dr Takakazu
Author_Institution
Fac. of Eng., Inf. & Syst., Univ. of Tsukuba, Tsukuba, Japan
fYear
2013
fDate
3-5 Dec. 2013
Firstpage
174
Lastpage
180
Abstract
In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on GPU clusters. This implementation is based on the six-step FFT algorithm. Because the parallel one-dimensional FFT requires three all-to-all communications, one goal for parallel FFTs on GPU clusters is to minimize the PCI Express transfer time and the MPI communication time. We demonstrate that the advanced features of MVAPICH2-GPU make it easy to overlap PCI Express transfers and MPI communication. Performance results of one-dimensional FFTs on a GPU cluster are reported. We successfully achieved a performance of over 763 GFlops on 128 nodes of the HA-PACS (268 nodes, 2.99 TFlops/node, 802 TFlops peak performance) for 234-point FFT.
Keywords
application program interfaces; fast Fourier transforms; graphics processing units; mathematics computing; message passing; parallel algorithms; pattern clustering; GPU clusters; HA-PACS; MPI communication time minimization; MVAPICH2-GPU; PCI Express transfer time minimization; all-to-all communications; graphics processing unit; parallel 1D FFT; parallel one-dimensional fast Fourier transform; six-step FFT algorithm; Arrays; Clustering algorithms; Equations; Graphics processing units; Indexes; Kernel; Performance evaluation; Fast Fourier transform; GPU cluster; all-to-all communication;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location
Sydney, NSW
Type
conf
DOI
10.1109/CSE.2013.36
Filename
6755214
Link To Document