Improvements to VTS feature enhancement

Author

Li, Jinyu ; Seltzer, Michael L. ; Gong, Yifan

Author_Institution

Microsoft Corp., Redmond, WA, USA

fYear

2012

fDate

25-30 March 2012

Firstpage

4677

Lastpage

4680

Abstract

By explicitly modelling the distortion of speech signals, model adaptation based on vector Taylor series (VTS) approaches have been shown to significantly improve the robustness of speech recognizers to environmental noise. However, the computational cost of VTS model adaptation (MVTS) methods hinders them from being widely used because they need to adapt all the HMM parameters for every utterance at runtime. In contrast, VTS feature enhancement (FVTS) methods have more computation advantages because they do not need multiple decoding passes and do not adapt all the HMM model parameters. In this paper, we propose two improvements to VTS feature enhancement: updating all of the environment distortion parameters and noise adaptive training of the front-end GMM. In addition, we investigate some other performance-related issues such as the selection of FVTS algorithms and the spectrum domain that MFCC is extracted from. As an important result of our investigation, we established the FVTS method can achieve comparable accuracy as the MVTS method with a smaller runtime cost. This makes FVTS method an ideal candidate for real world tasks.

Keywords

decoding; distortion; feature extraction; hidden Markov models; speech enhancement; speech recognition; FVTS methods; HMM model parameters; MVTS methods; VTS feature enhancement methods; VTS model adaptation method; computational cost; environment distortion parameters; environmental noise; front-end GMM; multiple decoding passes; noise adaptive training; real world tasks; speech recognizers; speech signals distortion; vector Taylor series-based model adaptation; Accuracy; Adaptation models; Hidden Markov models; Noise; Noise measurement; Nonlinear distortion; Speech; VTS; feature enhancement; model adaptation; robust ASR;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6288962

Filename

6288962