• DocumentCode
    68964
  • Title

    Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

  • Author

    Sano, Ko ; Hatsuda, Yoshiaki ; Yamamoto, Seiichi

  • Author_Institution
    Grad. Sch. of Inf. Sci., Tohoku Univ., Sendai, Japan
  • Volume
    25
  • Issue
    3
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    695
  • Lastpage
    705
  • Abstract
    Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of their small operational intensity. In this paper, we present a custom computing machine (CCM), called a scalable streaming-array (SSA), for high-performance stencil computations with multiple field-programmable gate arrays (FPGAs). We design SSA based on a domain-specific programmable concept, where CCMs are programmable with the minimum functionality required for an algorithm domain. We employ a deep pipelining approach over successive iterations to achieve linear scalability for multiple devices with a constant memory bandwidth. Prototype implementation using nine FPGAs demonstrates good agreement with a performance model, and achieves 260 and 236 GFlop/s for 2D and 3D Jacobi computation, which are 87.4 and 83.9 percent of the peak, respectively, with a memory bandwidth of only 2.0 GB/s. We also evaluate the performance of SSA for state-of-the-art FPGAs.
  • Keywords
    field programmable gate arrays; parallel processing; storage management; CCM; GPU; Jacobi computation; SSA; custom computing machine; deep pipelining approach; domain-specific programmable concept; field programmable gate array; graphics processing unit; high-performance stencil computations; memory bandwidth; multiFPGA accelerator; multicore microprocessors; scalable stencil computation; scalable streaming-array; scientific computations; Arrays; Bandwidth; Computational modeling; Field programmable gate arrays; Hardware; Scalability; FPGA; Scalable streaming-array; custom computing machine; high-performance computation; stencil computation;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.51
  • Filename
    6470606