DocumentCode
34976
Title
Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models
Author
Astudillo, Ramon Fernandez ; Orglmeister, Reinhold
Author_Institution
Spoken Language Syst. Lab., INESC-ID-Lisboa, Lisbon, Portugal
Volume
21
Issue
5
fYear
2013
fDate
May-13
Firstpage
1023
Lastpage
1034
Abstract
In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.
Keywords
Fourier transforms; Wiener filters; feature extraction; least mean squares methods; nonlinear estimation; speech recognition; AURORA4 robust ASR task; Ephraim-Malah filter; MFCC estimator; MMSE computation; MMSE-melfrequency cepstral coefficient; STFT domain speech distortion model; STFT uncertainty propagation formula; Wiener filter; automatic speech recognition; dynamic compensation; feature extraction method; minimum mean square error; nonlinear estimator; posterior distribution; short-time Fourier transform; Computational modeling; Feature extraction; Mel frequency cepstral coefficient; Robustness; Speech; Speech enhancement; Uncertainty; MMSE; uncertainty decoding; uncertainty propagation; wiener filter;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2013.2244085
Filename
6423820
Link To Document