Title :
Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models
Author :
Astudillo, Ramon Fernandez ; Orglmeister, Reinhold
Author_Institution :
Spoken Language Syst. Lab., INESC-ID-Lisboa, Lisbon, Portugal
Abstract :
In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.
Keywords :
Fourier transforms; Wiener filters; feature extraction; least mean squares methods; nonlinear estimation; speech recognition; AURORA4 robust ASR task; Ephraim-Malah filter; MFCC estimator; MMSE computation; MMSE-melfrequency cepstral coefficient; STFT domain speech distortion model; STFT uncertainty propagation formula; Wiener filter; automatic speech recognition; dynamic compensation; feature extraction method; minimum mean square error; nonlinear estimator; posterior distribution; short-time Fourier transform; Computational modeling; Feature extraction; Mel frequency cepstral coefficient; Robustness; Speech; Speech enhancement; Uncertainty; MMSE; uncertainty decoding; uncertainty propagation; wiener filter;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2013.2244085