X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks

Author

Jonghong Kim ; Kyuyeon Hwang ; Wonyong Sung

Author_Institution

Dept. of Electr. & Comput. Eng., Seoul Nat. Univ., Seoul, South Korea

fYear

2014

fDate

4-9 May 2014

Firstpage

7510

Lastpage

7514

Abstract

Deep neural networks show very good performance in phoneme and speech recognition applications when compared to previously used GMM (Gaussian Mixture Model)-based ones. However, efficient implementation of deep neural networks is difficult because the network size needs to be very large when high recognition accuracy is demanded. In this work, we develop a digital VLSI for phoneme recognition using deep neural networks and assess the design in terms of throughput, chip size, and power consumption. The developed VLSI employs a fixed-point optimization method that only uses +Δ, 0, and -Δ for representing each of the weight. The design employs 1,024 simple processing units in each layer, which however can be scaled easily according to the needed throughput, and the throughput of the architecture varies from 62.5 to 1,000 times of the real-time processing speed.

Keywords

Gaussian processes; VLSI; feedforward neural nets; mixture models; optimisation; power consumption; speech recognition; GMM; chip size; digital VLSI; feed-forward deep neural networks; fixed-point optimization method; gaussian mixture model; high recognition accuracy; power consumption; real-time phoneme recognition VLSI; real-time processing speed; speech recognition applications; Clocks; Computer architecture; Neural networks; Real-time systems; Registers; Throughput; Very large scale integration; Deep neural network; VLSI; fixed-point optimization; phoneme recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location

Florence

Type

conf

DOI

10.1109/ICASSP.2014.6855060

Filename

6855060