مرکز منطقه ای اطلاع رساني علوم و فناوري - Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises

DocumentCode :

1330965

Title :

Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises

Author :

Zhao, Yunxin

Author_Institution :

Dept. of Comput. Eng. & Comput. Sci., Missouri Univ., Columbia, MO, USA

Volume :

Issue :

fYear :

2000

fDate :

5/1/2000 12:00:00 AM

Firstpage :

255

Lastpage :

266

Abstract :

A feature estimation technique is proposed for speech signals that are degraded by both additive and convolutive noises. An EM algorithm is formulated in the frequency-domain for identification of the magnitude response of the distortion channel and power spectrum of additive noise, and posterior estimates of short-time power spectra of speech are obtained based on the identified channel and noise. The estimated posterior power spectra are used to calculate perceptually-based linear prediction cepstral coefficients, and the estimated cepstral features and their temporal regression coefficients are used for automatic speech recognition using acoustic models trained from clean speech. Experiments were performed on speaker independent continuous speech recognition, where the speech data were taken from the TIMIT database and were degraded by a distortion channel and simulated additive noises with white or colored spectral characteristics at various SNR levels. Experimental results indicate that the proposed technique leads to convergent identification of channel and noise and significantly improved recognition accuracy for speaker-independent continuous speech

Keywords :

cepstral analysis; convolution; frequency-domain analysis; maximum likelihood estimation; optimisation; prediction theory; speech recognition; telecommunication channels; white noise; EM algorithm; SNR levels; TIMIT database; acoustic models; additive noise; automatic speech recognition; cepstral features; clean speech; colored spectral characteristics; convolutive noise; distortion channel; experiments; feature estimation; frequency-domain maximum likelihood estimation; linear prediction cepstral coefficients; magnitude response identification; posterior estimates; posterior power spectra; power spectrum; recognition accuracy; short-time power spectra; speaker independent continuous speech recognition; speaker-independent continuous speech; speech data; speech signals; temporal regression coefficients; white spectral characteristics; Acoustic distortion; Acoustic noise; Additive noise; Automatic speech recognition; Cepstral analysis; Degradation; Frequency estimation; Maximum likelihood estimation; Speech enhancement; Speech recognition;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/89.841208

Filename :

841208

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1330965