• DocumentCode
    68994
  • Title

    An Optimized FFT-Based Direct Poisson Solver on CUDA GPUs

  • Author

    Jing Wu ; JaJa, Joseph ; Balaras, Elias

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Maryland, College Park, MD, USA
  • Volume
    25
  • Issue
    3
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    550
  • Lastpage
    559
  • Abstract
    A highly multithreaded FFT-based direct Poisson solver that makes effective use of the capabilities of the current NVIDIA graphics processing units (GPUs) is presented. Our algorithms carefully manage the multiple layers of the memory hierarchy of the GPUs such that almost all the global memory accesses are coalesced into 128-byte device memory transactions, and all computations are carried out directly on the registers. A new strategy to interleave the FFT computation along each dimension with other computations is used to minimize the total number of accesses to the 3D grid. We illustrate the performance of our algorithms on the NVIDIA Tesla and Fermi architectures for a wide range of grid sizes, up to the largest size that can fit on the device memory ($(512times 512times 512)$ on the Tesla C1060/C2050 and $(512times 256times 256)$ on the GeForce GTX 280/480). We achieve up to 140 GFLOPS and a bandwidth of 70 GB/s on the Tesla C1060, and up to 375 GFLOPS with a bandwidth of 120GB/s on the GTX 480. The performance of our algorithms is superior to what can be achieved using the CUDA FFT library in combination with well-known parallel algorithms for solving tridiagonal linear systems of equations.
  • Keywords
    fast Fourier transforms; graphics processing units; linear systems; parallel algorithms; parallel architectures; stochastic processes; storage management; 3D grid; CUDA FFT library; CUDA GPU; FFT computation; GFLOPS; GeForce GTX 280/480; NVIDIA Fermi architectures; NVIDIA Tesla architectures; NVIDIA graphics processing units; device memory; global memory accesses; memory hierarchy; memory transactions; multithreaded FFT-based direct Poisson solver; optimized FFT-based direct Poisson solver; parallel algorithms; tridiagonal linear systems; Computer architecture; Equations; Graphics processing units; Instruction sets; Kernel; Linear systems; Vectors; Fast-Fourier transforms; elliptic equations; parallel and vector implementations;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.53
  • Filename
    6470608