DocumentCode :
3132047
Title :
Context-dependent Deep Neural Networks for audio indexing of real-life data
Author :
Gang Li ; Huifeng Zhu ; Gong Cheng ; Thambiratnam, Kavintheran ; Chitsaz, B. ; Dong Yu ; Seide, Frank
Author_Institution :
Microsoft Res. Asia, Beijing, China
fYear :
2012
fDate :
2-5 Dec. 2012
Firstpage :
143
Lastpage :
148
Abstract :
We apply Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, to the real-life problem of audio indexing of data across various sources. Recently, we had shown that on the Switchboard benchmark on speaker-independent transcription of phone calls, CD-DNN-HMMs with 7 hidden layers reduce the word error rate by as much as one-third, compared to discriminatively trained Gaussian-mixture HMMs, and by one-fourth if the GMM-HMM also uses fMPE features. This paper takes CD-DNN-HMM based recognition into a real-life deployment for audio indexing. We find that for our best speaker-independent CD-DNN-HMM, with 32k senones trained on 2000h of data, the one-fourth reduction does carry over to inhomogeneous field data (video podcasts and talks). Compared to a speaker-adaptive GMM system, the relative improvement is 18%, at very similar end-to-end runtime. In system building, we find that DNNs can benefit from a larger number of senones than the GMM-HMM; and that DNN likelihood evaluation is a sizeable runtime factor even in our wide-beam context of generating rich lattices: Cutting the model size by 60% reduces runtime by one-third at a 5% relative WER loss.
Keywords :
Gaussian processes; neural nets; speech recognition; CD-DNN-HMM; Gaussian-mixture HMM; audio indexing; context dependent deep neural network HMM; real-life data; speaker adaptive GMM system; speaker independent transcription; speech recognition; switchboard benchmark; Acoustics; Computational modeling; Hidden Markov models; Indexing; Runtime; Speech recognition; Switches; audio indexing; deep learning; deep neural networks; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
Type :
conf
DOI :
10.1109/SLT.2012.6424212
Filename :
6424212
Link To Document :
بازگشت