DocumentCode :
3164956
Title :
Improvements to VTS feature enhancement
Author :
Li, Jinyu ; Seltzer, Michael L. ; Gong, Yifan
Author_Institution :
Microsoft Corp., Redmond, WA, USA
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4677
Lastpage :
4680
Abstract :
By explicitly modelling the distortion of speech signals, model adaptation based on vector Taylor series (VTS) approaches have been shown to significantly improve the robustness of speech recognizers to environmental noise. However, the computational cost of VTS model adaptation (MVTS) methods hinders them from being widely used because they need to adapt all the HMM parameters for every utterance at runtime. In contrast, VTS feature enhancement (FVTS) methods have more computation advantages because they do not need multiple decoding passes and do not adapt all the HMM model parameters. In this paper, we propose two improvements to VTS feature enhancement: updating all of the environment distortion parameters and noise adaptive training of the front-end GMM. In addition, we investigate some other performance-related issues such as the selection of FVTS algorithms and the spectrum domain that MFCC is extracted from. As an important result of our investigation, we established the FVTS method can achieve comparable accuracy as the MVTS method with a smaller runtime cost. This makes FVTS method an ideal candidate for real world tasks.
Keywords :
decoding; distortion; feature extraction; hidden Markov models; speech enhancement; speech recognition; FVTS methods; HMM model parameters; MVTS methods; VTS feature enhancement methods; VTS model adaptation method; computational cost; environment distortion parameters; environmental noise; front-end GMM; multiple decoding passes; noise adaptive training; real world tasks; speech recognizers; speech signals distortion; vector Taylor series-based model adaptation; Accuracy; Adaptation models; Hidden Markov models; Noise; Noise measurement; Nonlinear distortion; Speech; VTS; feature enhancement; model adaptation; robust ASR;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288962
Filename :
6288962
Link To Document :
بازگشت