Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition

Author

Kallasjoki, Heikki ; Gemmeke, Jort F. ; Palomaki, Kalle J.

Author_Institution

Dept. of Signal Process. & Acoust., Aalto Univ., Aalto, Finland

Volume

22

Issue

2

fYear

2014

fDate

Feb. 2014

Firstpage

368

Lastpage

380

Abstract

We present a method of improving automatic speech recognition performance under noisy conditions by using a source separation approach to extract the underlying clean speech signal. The feature enhancement processing is complemented with heuristic estimates of the uncertainty of the source separation, that are used to further assist the recognition. The uncertainty heuristics are converted to estimates of variance for the extracted clean speech using a Gaussian Mixture Model based mapping, and applied in the decoding stage under the observation uncertainty framework. We propose six heuristics, and evaluate them using both artificial and real-world noisy data, and with acoustic models trained on clean speech, a multi-condition noisy data set, and the multi-condition set processed with the source separation front-end. Taking the uncertainty of the enhanced features into account is shown to improve recognition performance when the acoustic models are trained on unenhanced data, while training on enhanced noisy data yields the lowest error rates.

Keywords

Gaussian processes; mixture models; source separation; speech recognition; Gaussian mixture model; automatic speech recognition performance; exemplar based feature enhancement; extracted clean speech; feature enhancement processing; heuristic estimates; multicondition noisy data set; noise robust speech recognition; observation uncertainty framework; source separation approach; source separation front end; underlying clean speech signal; Acoustics; Noise; Noise measurement; Speech; Speech recognition; Uncertainty; Vectors; Exemplar-based; noise robustness; observation uncertainty; speech recognition; uncertainty estimation;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2013.2292328

Filename

6677576