A propagation approach to modelling the joint distributions of clean and corrupted speech in the Mel-Cepstral domain

Author

Fernadez Astudillo, Ramon

Author_Institution

Spoken Language Syst. Lab., INESC-ID-Lisboa, Lisbon, Portugal

fYear

2013

fDate

8-12 Dec. 2013

Firstpage

180

Lastpage

185

Abstract

This paper presents a closed form solution relating the joint distributions of corrupted and clean speech in the short-time Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficient (MFCC) domains. This makes possible a tighter integration of STFT domain speech enhancement and feature and model-compensation techniques for robust automatic speech recognition. The approach directly utilizes the conventional speech distortion model for STFT speech enhancement, allowing for low cost, single pass, causal implementations. Compared to similar uncertainty propagation approaches, it provides the full joint distribution, rather than just the posterior distribution, which provides additional model compensation possibilities. The method is exemplified by deriving an MMSE-MFCC estimator from the propagated joint distribution. It is shown that similar performance to that of STFT uncertainty propagation (STFT-UP) can be obtained on the AURORA4, while deriving the full joint distribution.

Keywords

Fourier transforms; cepstral analysis; speech enhancement; speech recognition; MFCC domain; Mel-cepstral domain; Mel-frequency cepstral coefficient; STFT speech enhancement; STFT uncertainty propagation; clean speech distribution; closed form solution; corrupted speech distribution; model compensation technique; propagated joint distribution; propagation approach; robust automatic speech recognition; short time Fourier transform; speech distortion model; Computational modeling; Hidden Markov models; Joints; Mel frequency cepstral coefficient; Speech; Speech enhancement; Uncertainty; Modified Imputation; Speech Enhancement; Uncertainty Decoding; Uncertainty Propagation;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location

Olomouc

Type

conf

DOI

10.1109/ASRU.2013.6707726

Filename

6707726