DocumentCode :
146644
Title :
Design of a Coarse-Grained Processing Element for Matrix Multiplication on FPGA
Author :
Okuyama, Yuichi ; Takano, Shigeru ; Shirai, Tokimasa
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Aizu, Aizu-wakamatsu, Japan
fYear :
2014
fDate :
23-25 Sept. 2014
Firstpage :
237
Lastpage :
241
Abstract :
In this paper, we discuss and evaluate about a grain size of the PE of a matrix operation specific architecture with fused multiply add (FMA) units, Rapid MatriX, on FPGAs. Recent FPGAs have many DSP blocks which are high-performance arithmetic units. Hereby, implementing functional units for matrix operation to array structure of the Rapid MatriX, we propose to use DSP blocks efficiently by increasing grain size of FMA unit. We implement the Rapid MatriX using the refined PEs on an FPGA. In addition, we evaluate the clock frequencies and the clock cycles of calculation. As a result, throughput of the PE for 4times 4 matrix FMA is 3.14 times in comparison with the original PEs of scalar FMA for 8times 8 matrix multiplication.
Keywords :
arithmetic; clocks; field programmable gate arrays; integrated circuit design; matrix multiplication; DSP blocks; FMA units; FPGA; RapidMatriX; array structure; clock cycles; clock frequencies; coarse-grained processing element design; field-programmable gate array; functional units; fused multiply add units; grain size; high-performance arithmetic units; matrix multiplication; Arrays; Clocks; Digital signal processing; Equations; Field programmable gate arrays; Registers; FPGA; SIMD processor; matrix multiplication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Embedded Multicore/Manycore SoCs (MCSoc), 2014 IEEE 8th International Symposium on
Conference_Location :
Aizu-Wakamatsu
Type :
conf
DOI :
10.1109/MCSoC.2014.41
Filename :
6949477
Link To Document :
بازگشت