• DocumentCode
    146644
  • Title

    Design of a Coarse-Grained Processing Element for Matrix Multiplication on FPGA

  • Author

    Okuyama, Yuichi ; Takano, Shigeru ; Shirai, Tokimasa

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Aizu, Aizu-wakamatsu, Japan
  • fYear
    2014
  • fDate
    23-25 Sept. 2014
  • Firstpage
    237
  • Lastpage
    241
  • Abstract
    In this paper, we discuss and evaluate about a grain size of the PE of a matrix operation specific architecture with fused multiply add (FMA) units, Rapid MatriX, on FPGAs. Recent FPGAs have many DSP blocks which are high-performance arithmetic units. Hereby, implementing functional units for matrix operation to array structure of the Rapid MatriX, we propose to use DSP blocks efficiently by increasing grain size of FMA unit. We implement the Rapid MatriX using the refined PEs on an FPGA. In addition, we evaluate the clock frequencies and the clock cycles of calculation. As a result, throughput of the PE for 4times 4 matrix FMA is 3.14 times in comparison with the original PEs of scalar FMA for 8times 8 matrix multiplication.
  • Keywords
    arithmetic; clocks; field programmable gate arrays; integrated circuit design; matrix multiplication; DSP blocks; FMA units; FPGA; RapidMatriX; array structure; clock cycles; clock frequencies; coarse-grained processing element design; field-programmable gate array; functional units; fused multiply add units; grain size; high-performance arithmetic units; matrix multiplication; Arrays; Clocks; Digital signal processing; Equations; Field programmable gate arrays; Registers; FPGA; SIMD processor; matrix multiplication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Embedded Multicore/Manycore SoCs (MCSoc), 2014 IEEE 8th International Symposium on
  • Conference_Location
    Aizu-Wakamatsu
  • Type

    conf

  • DOI
    10.1109/MCSoC.2014.41
  • Filename
    6949477