DocumentCode
146644
Title
Design of a Coarse-Grained Processing Element for Matrix Multiplication on FPGA
Author
Okuyama, Yuichi ; Takano, Shigeru ; Shirai, Tokimasa
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Aizu, Aizu-wakamatsu, Japan
fYear
2014
fDate
23-25 Sept. 2014
Firstpage
237
Lastpage
241
Abstract
In this paper, we discuss and evaluate about a grain size of the PE of a matrix operation specific architecture with fused multiply add (FMA) units, Rapid MatriX, on FPGAs. Recent FPGAs have many DSP blocks which are high-performance arithmetic units. Hereby, implementing functional units for matrix operation to array structure of the Rapid MatriX, we propose to use DSP blocks efficiently by increasing grain size of FMA unit. We implement the Rapid MatriX using the refined PEs on an FPGA. In addition, we evaluate the clock frequencies and the clock cycles of calculation. As a result, throughput of the PE for 4times 4 matrix FMA is 3.14 times in comparison with the original PEs of scalar FMA for 8times 8 matrix multiplication.
Keywords
arithmetic; clocks; field programmable gate arrays; integrated circuit design; matrix multiplication; DSP blocks; FMA units; FPGA; RapidMatriX; array structure; clock cycles; clock frequencies; coarse-grained processing element design; field-programmable gate array; functional units; fused multiply add units; grain size; high-performance arithmetic units; matrix multiplication; Arrays; Clocks; Digital signal processing; Equations; Field programmable gate arrays; Registers; FPGA; SIMD processor; matrix multiplication;
fLanguage
English
Publisher
ieee
Conference_Titel
Embedded Multicore/Manycore SoCs (MCSoc), 2014 IEEE 8th International Symposium on
Conference_Location
Aizu-Wakamatsu
Type
conf
DOI
10.1109/MCSoC.2014.41
Filename
6949477
Link To Document