DocumentCode :
1330965
Title :
Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises
Author :
Zhao, Yunxin
Author_Institution :
Dept. of Comput. Eng. & Comput. Sci., Missouri Univ., Columbia, MO, USA
Volume :
8
Issue :
3
fYear :
2000
fDate :
5/1/2000 12:00:00 AM
Firstpage :
255
Lastpage :
266
Abstract :
A feature estimation technique is proposed for speech signals that are degraded by both additive and convolutive noises. An EM algorithm is formulated in the frequency-domain for identification of the magnitude response of the distortion channel and power spectrum of additive noise, and posterior estimates of short-time power spectra of speech are obtained based on the identified channel and noise. The estimated posterior power spectra are used to calculate perceptually-based linear prediction cepstral coefficients, and the estimated cepstral features and their temporal regression coefficients are used for automatic speech recognition using acoustic models trained from clean speech. Experiments were performed on speaker independent continuous speech recognition, where the speech data were taken from the TIMIT database and were degraded by a distortion channel and simulated additive noises with white or colored spectral characteristics at various SNR levels. Experimental results indicate that the proposed technique leads to convergent identification of channel and noise and significantly improved recognition accuracy for speaker-independent continuous speech
Keywords :
cepstral analysis; convolution; frequency-domain analysis; maximum likelihood estimation; optimisation; prediction theory; speech recognition; telecommunication channels; white noise; EM algorithm; SNR levels; TIMIT database; acoustic models; additive noise; automatic speech recognition; cepstral features; clean speech; colored spectral characteristics; convolutive noise; distortion channel; experiments; feature estimation; frequency-domain maximum likelihood estimation; linear prediction cepstral coefficients; magnitude response identification; posterior estimates; posterior power spectra; power spectrum; recognition accuracy; short-time power spectra; speaker independent continuous speech recognition; speaker-independent continuous speech; speech data; speech signals; temporal regression coefficients; white spectral characteristics; Acoustic distortion; Acoustic noise; Additive noise; Automatic speech recognition; Cepstral analysis; Degradation; Frequency estimation; Maximum likelihood estimation; Speech enhancement; Speech recognition;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.841208
Filename :
841208
Link To Document :
بازگشت