مرکز منطقه ای اطلاع رساني علوم و فناوري - Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

DocumentCode :

1144071

Title :

Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

Author :

Yu, Dong ; Deng, Li ; Droppo, Jasha ; Wu, Jian ; Gong, Yifan ; Acero, Alex

Author_Institution :

Microsoft Corp., Redmond, WA

Volume :

Issue :

fYear :

2008

fDate :

7/1/2008 12:00:00 AM

Firstpage :

1061

Lastpage :

1070

Abstract :

We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank´s output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M´s log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M´s log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.

Keywords :

Fourier transform spectra; cepstral analysis; channel bank filters; discrete Fourier transforms; interference suppression; least mean squares methods; piecewise linear techniques; signal denoising; speech recognition; Mel-frequency cepstra; cepstral mean normalization baseline; discrete Fourier transform spectra; filter banks output; minimum-mean-square-error optimization criterion; nonlinear feature-domain noise suppression algorithm; robust speech recognition; spectral amplitude noise suppressor; stereo-based piecewise linear compensation; word error rate; Mel-frequency cepstral coefficient (MFCC); minimum-mean-square-error (MMSE) estimate; noise reduction; phase asynchrony; robust automatic speech recognition (ASR);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2008.921761

Filename :

4497834

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1144071