DocumentCode :
34976
Title :
Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models
Author :
Astudillo, Ramon Fernandez ; Orglmeister, Reinhold
Author_Institution :
Spoken Language Syst. Lab., INESC-ID-Lisboa, Lisbon, Portugal
Volume :
21
Issue :
5
fYear :
2013
fDate :
May-13
Firstpage :
1023
Lastpage :
1034
Abstract :
In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.
Keywords :
Fourier transforms; Wiener filters; feature extraction; least mean squares methods; nonlinear estimation; speech recognition; AURORA4 robust ASR task; Ephraim-Malah filter; MFCC estimator; MMSE computation; MMSE-melfrequency cepstral coefficient; STFT domain speech distortion model; STFT uncertainty propagation formula; Wiener filter; automatic speech recognition; dynamic compensation; feature extraction method; minimum mean square error; nonlinear estimator; posterior distribution; short-time Fourier transform; Computational modeling; Feature extraction; Mel frequency cepstral coefficient; Robustness; Speech; Speech enhancement; Uncertainty; MMSE; uncertainty decoding; uncertainty propagation; wiener filter;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2013.2244085
Filename :
6423820
Link To Document :
بازگشت