مرکز منطقه ای اطلاع رساني علوم و فناوري - Mask estimation employing Posterior-based Representative Mean for missing-feature speech recognition with time-varying background noise

DocumentCode :

2973697

Title :

Mask estimation employing Posterior-based Representative Mean for missing-feature speech recognition with time-varying background noise

Author :

Kim, Wooil ; Hansen, John H L

Author_Institution :

Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA

fYear :

2009

fDate :

Nov. 13 2009-Dec. 17 2009

Firstpage :

194

Lastpage :

198

Abstract :

This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional mask estimation methods based on noise estimates and spectral subtraction fail to reliably estimate the mask. The proposed mask estimation method utilizes a posterior-based representative mean (PRM) vector for determining the reliability of the input speech spectrum, which is obtained as a weighted sum of the mean parameters of the speech model with posterior probabilities. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +36.29% and +30.45% average relative improvements in WER for speech babble and background music conditions respectively, compared to conventional mask estimation methods.

Keywords :

probability; signal reconstruction; speech intelligibility; speech recognition; feature compensation; mask estimation; mean parameter; missing-feature reconstruction; noise estimates; noise-corrupted speech model; posterior probability; posterior-based representative mean; spectral subtraction; speech recognition; speech spectrum; time-varying background noise; Background noise; Degradation; Frequency estimation; Noise robustness; Reliability engineering; Signal to noise ratio; Speech coding; Speech enhancement; Speech recognition; Working environment noise;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Conference_Location :

Merano

Print_ISBN :

978-1-4244-5478-5

Electronic_ISBN :

978-1-4244-5479-2

Type :

conf

DOI :

10.1109/ASRU.2009.5373398

Filename :

5373398

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2973697