مرکز منطقه ای اطلاع رساني علوم و فناوري - Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition

DocumentCode :

1687759

Title :

Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition

Author :

Yebo Bao ; Hui Jiang ; Lirong Dai ; Cong Liu

Author_Institution :

Dept. of Electron. Eng. & Inf. Sci., Univ. of Sci. & Technol. of China, Hefei, China

fYear :

2013

Firstpage :

6980

Lastpage :

6984

Abstract :

Recently, the hybrid model combining deep neural network (DNN) with context-dependent HMMs has achieved some dramatic gains over the conventional GMM/HMM method in many speech recognition tasks. In this paper, we study how to compete with the state-of-the-art DNN/HMM method under the traditional GMM/HMM framework. Instead of using DNN as acoustic model, we use DNN as a front-end bottleneck (BN) feature extraction method to decorrelate long feature vectors concatenated from several consecutive speech frames. More importantly, we have proposed two novel incoherent training methods to explicitly de-correlate BN features in learning of DNN. The first method relies on minimizing coherence of weight matrices in DNN while the second one attempts to minimize correlation coefficients of BN features calculated in each mini-batch data in DNN training. Experimental results on a 70-hr Mandarin transcription task and the 309-hr Switchboard task have shown that the traditional GMM/HMMs using BN features can yield comparable performance as DNN/HMM. The proposed incoherent training can produce 2-3% additional gain over the baseline BN features. At last, the discriminatively trained GMM/HMMs using incoherently trained BN features have consistently surpassed the state-of-the-art DNN/HMMs in all evaluated tasks.

Keywords :

correlation methods; feature extraction; hidden Markov models; neural nets; speech recognition; context dependent HMM; correlation coefficients; decorrelate bottleneck features; deep neural networks; front end bottleneck feature extraction method; incoherent training; speech recognition; weight matrices; Correlation; Feature extraction; Hidden Markov models; Neural networks; Speech recognition; Training; Vectors; Deep neural networks (DNN); bottleneck features; incoherent training; large vocabulary continuous speech recognition (LVCSR); nonlinear dimensionality reduction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639015

Filename :

6639015

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1687759