DocumentCode :
3123684
Title :
Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling
Author :
Jia Pan ; Cong Liu ; Zhiguo Wang ; Yu Hu ; Hui Jiang
Author_Institution :
iFlytek Res., Hefei, China
fYear :
2012
fDate :
5-8 Dec. 2012
Firstpage :
301
Lastpage :
305
Abstract :
Recently, it has been reported that context-dependent deep neural network (DNN) has achieved some unprecedented gains in many challenging ASR tasks, including the well-known Switchboard task. In this paper, we first investigate DNN for several large vocabulary speech recognition tasks. Our results have confirmed that DNN can consistently achieve about 25-30% relative error reduction over the best discriminatively trained GMMs even in some ASR tasks with up to 700 hours of training data. Next, we have conducted a series of experiments to study where the unprecedented gain of DNN comes from. Our experiments show the gain of DNN is almost entirely attributed to DNN´s feature vectors that are concatenated from several consecutive speech frames within a relatively long context window. At last, we have proposed a few ideas to reconfigure the DNN input features, such as using logarithm spectrum features or VTLN normalized features in DNN. Our results have shown that each of these methods yields over 3% relative error reduction over the traditional MFCC or PLP features in DNN.
Keywords :
Gaussian processes; acoustic signal processing; learning (artificial intelligence); neural nets; speech recognition; vectors; vocabulary; ASR tasks; DNN input feature vectors; GMMS; Gaussian mixture models; VTLN normalized features; acoustic modeling; automatic speech recognition; context-dependent deep neural network; discriminatively trained GMM; large vocabulary continuous speech recognition; logarithm spectrum features; relative error reduction; speech frames; switchboard task; training data; Context; Hidden Markov models; Neural networks; Speech recognition; Switches; Training; Vectors; acoustic modeling; deep neural networks; pre-training; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on
Conference_Location :
Kowloon
Print_ISBN :
978-1-4673-2506-6
Electronic_ISBN :
978-1-4673-2505-9
Type :
conf
DOI :
10.1109/ISCSLP.2012.6423452
Filename :
6423452
Link To Document :
بازگشت