Title :
A Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Model for Noisy Speech Recognition
Author :
Du, Jun ; Huo, Qiang
Author_Institution :
Visual Comput. Group, Microsoft Res. Asia (MSRA), Beijing, China
Abstract :
This paper presents a new feature compensation approach to noisy speech recognition by using high-order vector Taylor series (HOVTS) approximation of an explicit model of environmental distortions. Formulations for maximum-likelihood (ML) estimation of both additive noises and convolutional distortions, and minimum mean squared error (MMSE) estimation of clean speech are derived. Experimental results on Aurora2 and Aurora4 benchmark databases, where the modeling assumption of the distortion model is more accurate, demonstrate that the standard HOVTS-based feature compensation approaches achieve consistently significant improvement in recognition accuracy compared to traditional standard first-order VTS-based approach. For a real-world in-vehicle connected digits recognition task on Aurora3 benchmark database where the modeling assumption of the distortion model is less accurate, modifications are necessary to make VTS-based feature compensation approaches work. In this case, the second-order VTS-based approach performs only slightly better than the first-order VTS-based approach.
Keywords :
compensation; least mean squares methods; maximum likelihood estimation; speech recognition; Aurora2 benchmark database; Aurora3 benchmark database; Aurora4 benchmark database; HOVTS approximation; ML estimation; MMSE estimation; additive noises; convolutional distortions; environmental distortions; explicit distortion model; feature compensation approach; first-order VTS-based approach; high-order vector Taylor series approximation; invehicle connected digit recognition task; maximum likelihood estimation; minimum mean squared error estimation; noisy speech recognition; Approximation methods; Cepstral analysis; Hidden Markov models; Nonlinear distortion; Signal to noise ratio; Speech; Speech recognition; Distortion model; feature compensation; noise robustness; robust speech recognition; vector Taylor series (VTS);
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2011.2129508