Title :
Mask estimation employing Posterior-based Representative Mean for missing-feature speech recognition with time-varying background noise
Author :
Kim, Wooil ; Hansen, John H L
Author_Institution :
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
fDate :
Nov. 13 2009-Dec. 17 2009
Abstract :
This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional mask estimation methods based on noise estimates and spectral subtraction fail to reliably estimate the mask. The proposed mask estimation method utilizes a posterior-based representative mean (PRM) vector for determining the reliability of the input speech spectrum, which is obtained as a weighted sum of the mean parameters of the speech model with posterior probabilities. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +36.29% and +30.45% average relative improvements in WER for speech babble and background music conditions respectively, compared to conventional mask estimation methods.
Keywords :
probability; signal reconstruction; speech intelligibility; speech recognition; feature compensation; mask estimation; mean parameter; missing-feature reconstruction; noise estimates; noise-corrupted speech model; posterior probability; posterior-based representative mean; spectral subtraction; speech recognition; speech spectrum; time-varying background noise; Background noise; Degradation; Frequency estimation; Noise robustness; Reliability engineering; Signal to noise ratio; Speech coding; Speech enhancement; Speech recognition; Working environment noise;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on
Conference_Location :
Merano
Print_ISBN :
978-1-4244-5478-5
Electronic_ISBN :
978-1-4244-5479-2
DOI :
10.1109/ASRU.2009.5373398