• DocumentCode
    34976
  • Title

    Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models

  • Author

    Astudillo, Ramon Fernandez ; Orglmeister, Reinhold

  • Author_Institution
    Spoken Language Syst. Lab., INESC-ID-Lisboa, Lisbon, Portugal
  • Volume
    21
  • Issue
    5
  • fYear
    2013
  • fDate
    May-13
  • Firstpage
    1023
  • Lastpage
    1034
  • Abstract
    In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.
  • Keywords
    Fourier transforms; Wiener filters; feature extraction; least mean squares methods; nonlinear estimation; speech recognition; AURORA4 robust ASR task; Ephraim-Malah filter; MFCC estimator; MMSE computation; MMSE-melfrequency cepstral coefficient; STFT domain speech distortion model; STFT uncertainty propagation formula; Wiener filter; automatic speech recognition; dynamic compensation; feature extraction method; minimum mean square error; nonlinear estimator; posterior distribution; short-time Fourier transform; Computational modeling; Feature extraction; Mel frequency cepstral coefficient; Robustness; Speech; Speech enhancement; Uncertainty; MMSE; uncertainty decoding; uncertainty propagation; wiener filter;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2244085
  • Filename
    6423820