Title :
A deep convolutional neural network based on nested residue number system
Author :
Hiroki Nakahara;Tsutomu Sasao
Author_Institution :
Ehime University, Japan
Abstract :
A pre-trained deep convolutional neural network (DCNN) is the feed-forward computation perspective which is widely used for the embedded vision systems. In the DCNN, the 2D convolutional operation occupies more than 90% of the computation time. Since the 2D convolutional operation performs massive multiply-accumulation (MAC) operations, conventional realizations could not implement a fully parallel DCNN. The RNS decomposes an integer into a tuple of L integers by residues of moduli set. Since no pair of modulus have a common factor with any other, the conventional RNS decomposes the MAC unit into circuits with different sizes. It means that the RNS could not utilize resources of an FPGA with uniform size. In this paper, we propose the nested RNS (NRNS), which recursively decompose the RNS. It can decompose the MAC unit into circuits with small sizes. In the DCNN using the NRNS, a 48-bit MAC unit is decomposed into 4-bit ones realized by look-up tables of the FPGA. In the system, we also use binary to NRNS converters and NRNS to binary converters. The binary to NRNS converter is realized by on-chip BRAMs, while the NRNS to binary one is realized by DSP blocks and BRAMs. Thus, a balanced usage of FPGA resources leads to a high clock frequency with less hardware. The ImageNet DCNN using the NRNS is implemented on a Xilinx Virtex VC707 evaluation board. As for the performance per area GOPS (Giga operations per second) per a slice, the proposed one is 5.86 times better than the existing best realization.
Keywords :
"Table lookup","Field programmable gate arrays","Convolution","Kernel","Neural networks","Clocks","Dynamic range"
Conference_Titel :
Field Programmable Logic and Applications (FPL), 2015 25th International Conference on
DOI :
10.1109/FPL.2015.7293933