مرکز منطقه ای اطلاع رساني علوم و فناوري - Model-Based Feature Enhancement for Reverberant Speech Recognition

DocumentCode :

1486160

Title :

Model-Based Feature Enhancement for Reverberant Speech Recognition

Author :

Krueger, Alexander ; Haeb-Umbach, Reinhold

Author_Institution :

Dept. of Commun., Univ. of Paderborn, Paderborn, Germany

Volume :

Issue :

fYear :

2010

Firstpage :

1692

Lastpage :

1707

Abstract :

In this paper, we present a new technique for automatic speech recognition (ASR) in reverberant environments. Our approach is aimed at the enhancement of the logarithmic Mel power spectrum, which is computed at an intermediate stage to obtain the widely used Mel frequency cepstral coefficients (MFCCs). Given the reverberant logarithmic Mel power spectral coefficients (LMPSCs), a minimum mean square error estimate of the clean LMPSCs is computed by carrying out Bayesian inference. We employ switching linear dynamical models as an a priori model for the dynamics of the clean LMPSCs. Further, we derive a stochastic observation model which relates the clean to the reverberant LMPSCs through a simplified model of the room impulse response (RIR). This model requires only two parameters, namely RIR energy and reverberation time, which can be estimated from the captured microphone signal. The performance of the proposed enhancement technique is studied on the AURORA5 database and compared to that of constrained maximum-likelihood linear regression (CMLLR). It is shown by experimental results that our approach significantly outperforms CMLLR and that up to 80% of the errors caused by the reverberation are recovered. In addition to the fact that the approach is compatible with the standard MFCC feature vectors, it leaves the ASR back-end unchanged. It is of moderate computational complexity and suitable for real time applications.

Keywords :

belief networks; least mean squares methods; maximum likelihood estimation; regression analysis; reverberation; speech recognition; stochastic processes; ASR; AURORA5 database; Bayesian inference; CMLLR; LMPSC computation; MFCC feature vectors; Mel frequency cepstral coefficients; RIR energy; automatic speech recognition; computational complexity; constrained maximum likelihood linear regression; logarithmic Mel power spectrum; microphone signal; minimum mean square error estimation; model-based feature enhancement; reverberant speech recognition; room impulse response; stochastic observation model; Automatic speech recognition; Bayesian methods; Databases; Energy capture; Mean square error methods; Mel frequency cepstral coefficient; Microphones; Reverberation; Speech recognition; Stochastic processes; Automatic speech recognition (ASR); feature enhancement; reverberant speech recognition;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2010.2049684

Filename :

5461033

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1486160