Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models

Author

Astudillo, Ramon Fernandez ; Orglmeister, Reinhold

Author_Institution

Spoken Language Syst. Lab., INESC-ID-Lisboa, Lisbon, Portugal

Volume

21

Issue

5

fYear

2013

fDate

May-13

Firstpage

1023

Lastpage

1034

Abstract

In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.

Keywords

Fourier transforms; Wiener filters; feature extraction; least mean squares methods; nonlinear estimation; speech recognition; AURORA4 robust ASR task; Ephraim-Malah filter; MFCC estimator; MMSE computation; MMSE-melfrequency cepstral coefficient; STFT domain speech distortion model; STFT uncertainty propagation formula; Wiener filter; automatic speech recognition; dynamic compensation; feature extraction method; minimum mean square error; nonlinear estimator; posterior distribution; short-time Fourier transform; Computational modeling; Feature extraction; Mel frequency cepstral coefficient; Robustness; Speech; Speech enhancement; Uncertainty; MMSE; uncertainty decoding; uncertainty propagation; wiener filter;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2013.2244085

Filename

6423820