• DocumentCode
    1120052
  • Title

    High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs

  • Author

    Zhuo, Ling ; Morris, Gerald R. ; Prasanna, Viktor K.

  • Author_Institution
    Univ. of Southern California, Los Angeles
  • Volume
    18
  • Issue
    10
  • fYear
    2007
  • Firstpage
    1377
  • Lastpage
    1392
  • Abstract
    Field-programmable gate arrays (FPGAs) have become an attractive option for accelerating scientific applications. Many scientific operations such as matrix-vector multiplication and dot product involve the reduction of a sequentially produced stream of values. Unfortunately, because of the pipelining in FPGA-based floating-point units, data hazards may occur during these sequential reduction operations. Improperly designed reduction circuits can adversely impact the performance, impose unrealistic buffer requirements, and consume a significant portion of the FPGA. In this paper, we identify two basic methods for designing serial reduction circuits: the tree-traversal method and the striding method. Using accumulation as an example, we analyze the design trade-offs among the number of adders, buffer size, and latency. We then propose high-performance and area-efficient designs using each method. The proposed designs reduce multiple sets of sequentially delivered floating-point values without stalling the pipeline or imposing unrealistic buffer requirements. Using a Xilinx Virtex-ll Pro FPGA as the target device, we implemented our designs and present performance and area results.
  • Keywords
    adders; field programmable gate arrays; logic design; pipeline processing; Xilinx Virtex-ll Pro FPGA; adder; buffer size; field-programmable gate array; floating-point values; high-performance reduction circuits; parallel algorithm; pipelined operators; reconfigurable hardware; serial reduction circuits; striding method; tree-traversal method; Acceleration; Adders; Circuits; Clocks; Delay; Design methodology; Field programmable gate arrays; Hazards; Parallel processing; Pipeline processing; C.3.e Reconfigurable hardware; G.1.0.g Parallel algorithms;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2007.1068
  • Filename
    4302726