DocumentCode :
3738076
Title :
FPGA implementation of a SIMD-based array processor with torus interconnect
Author :
Yuki Murakami
Author_Institution :
Graduate School of Computer Science and Engineering, University of Aizu, Japan
fYear :
2015
Firstpage :
244
Lastpage :
247
Abstract :
Matrix computations are a fundamental tool in scientific and engineering applications. Among many such applications, Convolutional Neural Networks (CNN) that can be effectively computed by matrix-matrix multiplications are being popular and an efficient implementation of CNN is highly important. In this study, we have designed an parallel processor for the matrix computations using torus interconnect topology, and we implemented Cannon´s algorithm for matrix-matrix multiply-add. We have evaluated the scalability of the proposed processor on a reconfigurable FPGA platform. More precisely, the designed processor with 8 × 8 functional units with 16 bit floating-point multiply-add unit was evaluated on Cyclone IV FPGA chip, with performance of 27 GFlops. We also implemented CNN calculations on our processor. We compared the matrix based approach and our proposed method. As a result, our method is 25 times faster than the matrix based approach if the processor has 8×8 functional units, image size is 32×32 and filter size is 5 × 5.
Keywords :
"Arrays","Convolution","Field programmable gate arrays","Radio frequency","Hardware design languages","Hardware","Ports (Computers)"
Publisher :
ieee
Conference_Titel :
Field Programmable Technology (FPT), 2015 International Conference on
Type :
conf
DOI :
10.1109/FPT.2015.7393159
Filename :
7393159
Link To Document :
بازگشت