Title :
Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition
Author :
Yebo Bao ; Hui Jiang ; Lirong Dai ; Cong Liu
Author_Institution :
Dept. of Electron. Eng. & Inf. Sci., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
Recently, the hybrid model combining deep neural network (DNN) with context-dependent HMMs has achieved some dramatic gains over the conventional GMM/HMM method in many speech recognition tasks. In this paper, we study how to compete with the state-of-the-art DNN/HMM method under the traditional GMM/HMM framework. Instead of using DNN as acoustic model, we use DNN as a front-end bottleneck (BN) feature extraction method to decorrelate long feature vectors concatenated from several consecutive speech frames. More importantly, we have proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN. The first method relies on minimizing coherence of weight matrices in DNN while the second one attempts to minimize correlation coefficients of BN features calculated in each mini-batch data in DNN training. Experimental results on a 70-hr Mandarin transcription task and the 309-hr Switchboard task have shown that the traditional GMM/HMMs using BN features can yield comparable performance as DNN/HMM. The proposed incoherent training can produce 2-3% additional gain over the baseline BN features. At last, the discriminatively trained GMM/HMMs using incoherently trained BN features have consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.
Keywords :
correlation methods; feature extraction; hidden Markov models; neural nets; speech recognition; context dependent HMM; correlation coefficients; decorrelate bottleneck features; deep neural networks; front end bottleneck feature extraction method; incoherent training; speech recognition; weight matrices; Correlation; Feature extraction; Hidden Markov models; Neural networks; Speech recognition; Training; Vectors; Deep neural networks (DNN); bottleneck features; incoherent training; large vocabulary continuous speech recognition (LVCSR); nonlinear dimensionality reduction;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639015