Title :
Small-footprint high-performance deep neural network-based speech recognition using split-VQ
Author :
Yongqiang Wang ; Jinyu Li ; Yifan Gong
Author_Institution :
Microsoft Corp., Redmond, WA, USA
Abstract :
Due to a large number of parameters in deep neural networks (DNNs), it is challenging to design a small-footprint DNN-based speech recognition system while maintaining a high recognition performance. Even with a singular value matrix decomposition (SVD) method and scalar quantization, the DNN model is still too large to be deployed on many mobile devices. Common practices like reducing the number of hidden nodes often result in significant accuracy loss. In this work, we propose to split each row vector of weight matrices into sub-vectors, and quantize them into a set of codewords using a split vector quantization (split-VQ) algorithm. The codebook can be fine-tuned using back-propagation when an aggressive quantization is performed. Experimental results demonstrate that the proposed method can further reduce the model size by 75% to 80% and save 10% to 50% computation on top of an already very compact SVD-DNN without a noticeable performance degradation. This results in a 3.2 MB-footprint DNN giving similar recognition performance as what a 59.1 MB standard DNN can achieve.
Keywords :
neural nets; quantisation (signal); singular value decomposition; speech recognition; codewords; scalar quantization; singular value matrix decomposition method; small-footprint high-performance deep neural network; speech recognition; split vector quantization algorithm; split-VQ; Accuracy; Acoustics; Hidden Markov models; Matrix decomposition; Neural networks; Quantization (signal); Speech recognition; DNN; model compression; on device speech recognition; split-VQ;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178919