مرکز منطقه ای اطلاع رساني علوم و فناوري - KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition

DocumentCode :

1693220

Title :

KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition

Author :

Dong Yu ; Kaisheng Yao ; Hang Su ; Gang Li ; Seide, Frank

Author_Institution :

Microsoft Res., Redmond, WA, USA

fYear :

2013

Firstpage :

7893

Lastpage :

7897

Abstract :

We propose a novel regularized adaptation technique for context dependent deep neural network hidden Markov models (CD-DNN-HMMs). The CD-DNN-HMM has a large output layer and many large hidden layers, each with thousands of neurons. The huge number of parameters in the CD-DNN-HMM makes adaptation a challenging task, esp. when the adaptation set is small. The technique developed in this paper adapts the model conservatively by forcing the senone distribution estimated from the adapted model to be close to that from the unadapted model. This constraint is realized by adding Kullback-Leibler divergence (KLD) regularization to the adaptation criterion. We show that applying this regularization is equivalent to changing the target distribution in the conventional backpropagation algorithm. Experiments on Xbox voice search, short message dictation, and Switchboard and lecture speech transcription tasks demonstrate that the proposed adaptation technique can provide 2%-30% relative error reduction against the already very strong speaker independent CD-DNN-HMM systems using different adaptation sets under both supervised and unsupervised adaptation setups.

Keywords :

backpropagation; hidden Markov models; neural nets; speech recognition; CD-DNN-HMM; KL-divergence; Kullback-Leibler divergence; Xbox voice search; backpropagation; context dependent deep neural network; hidden Markov models; large vocabulary speech recognition; lecture speech transcription tasks; regularized deep neural network adaptation; relative error reduction; senone distribution; short message dictation; switchboard tasks; target distribution; unadapted model; Adaptation models; Artificial neural networks; Hidden Markov models; Silicon; Speech recognition; Training; CD-DNN-HMM; Kullback-Leibler divergence regularization; deep neural network; speaker adaptation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639201

Filename :

6639201

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1693220