Title :
Speech recognition in non-stationary adverse environments
Author :
Wang, Zhong-Hua ; Kenny, Patrick
Author_Institution :
INRS Telecommun., Ile des Soeurs, Que., Canada
Abstract :
We introduce a new approach, called non-stationary adaptation (NA), to recognize speech under non-stationary adverse environments. Two models are used: one is a speaker-independent hidden Markov model (HMM) for clean speech, the other is an ergodic Markov chain representing the non-stationary adverse environment. Each state in the Markov chain represents one stationary adverse condition and has associated with it an affine transform that is estimated by maximum likelihood linear regression (MLLR). Three kinds of adverse environments are considered: (i) multi-speaker speech recognition where the speaker identity changes randomly and this constitutes a non-stationary adverse condition, (ii) the recognition of speech corrupted by machinegun noise, and (iii) the crosstalk problem. The algorithm is tested on the Nov92 development database of WSJF0 with a vocabulary size of 20000. In multi-speaker speech recognition, NA decreases the error rate by 13.6%. For speech corrupted by machinegun noise, a one-state Markov chain decreases the error rate by 18%, and a two-state Markov chain gives another 14% decrease in error rate. In the crosstalk problem, a one-state Markov chain decreases the error rate by 16.8%. Two-state and three-state Markov chains decrease the error rate by 22% and 24.4%, respectively
Keywords :
Markov processes; crosstalk; error statistics; maximum likelihood estimation; speech recognition; Nov92 development database; WSJF0; affine transform; clean speech; crosstalk problem; ergodic Markov chain; error rate; machinegun noise; maximum likelihood linear regression; multi-speaker speech recognition; non-stationary adaptation; non-stationary adverse environments; one-state Markov chain; speaker-independent hidden Markov model; two-state Markov chain; vocabulary size; Crosstalk; Databases; Error analysis; Hidden Markov models; Maximum likelihood linear regression; Speech enhancement; Speech recognition; State estimation; Testing; Working environment noise;
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7803-4428-6
DOI :
10.1109/ICASSP.1998.674418