• DocumentCode
    3459504
  • Title

    Implementation of Parallel 1-D FFT on GPU Clusters

  • Author

    Takahashi, Dr Takakazu

  • Author_Institution
    Fac. of Eng., Inf. & Syst., Univ. of Tsukuba, Tsukuba, Japan
  • fYear
    2013
  • fDate
    3-5 Dec. 2013
  • Firstpage
    174
  • Lastpage
    180
  • Abstract
    In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on GPU clusters. This implementation is based on the six-step FFT algorithm. Because the parallel one-dimensional FFT requires three all-to-all communications, one goal for parallel FFTs on GPU clusters is to minimize the PCI Express transfer time and the MPI communication time. We demonstrate that the advanced features of MVAPICH2-GPU make it easy to overlap PCI Express transfers and MPI communication. Performance results of one-dimensional FFTs on a GPU cluster are reported. We successfully achieved a performance of over 763 GFlops on 128 nodes of the HA-PACS (268 nodes, 2.99 TFlops/node, 802 TFlops peak performance) for 234-point FFT.
  • Keywords
    application program interfaces; fast Fourier transforms; graphics processing units; mathematics computing; message passing; parallel algorithms; pattern clustering; GPU clusters; HA-PACS; MPI communication time minimization; MVAPICH2-GPU; PCI Express transfer time minimization; all-to-all communications; graphics processing unit; parallel 1D FFT; parallel one-dimensional fast Fourier transform; six-step FFT algorithm; Arrays; Clustering algorithms; Equations; Graphics processing units; Indexes; Kernel; Performance evaluation; Fast Fourier transform; GPU cluster; all-to-all communication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
  • Conference_Location
    Sydney, NSW
  • Type

    conf

  • DOI
    10.1109/CSE.2013.36
  • Filename
    6755214