مرکز منطقه ای اطلاع رساني علوم و فناوري - Alignment-based codeword-dependent cepstral normalization

DocumentCode :

857188

Title :

Alignment-based codeword-dependent cepstral normalization

Author :

Huerta, Juan Manuel

Author_Institution :

IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA

Volume :

Issue :

fYear :

2002

fDate :

10/1/2002 12:00:00 AM

Firstpage :

451

Lastpage :

459

Abstract :

This paper proposes the alignment-based codeword dependent cepstral normalization algorithm (ACDC_N) which aims to alleviate the acoustical mismatch that occurs when the speech recognizer faces environmental conditions not observed in the training data. ACDC_N is based on the linear channel model of the environment originally proposed by Acero (1990) and on the CDCN solution to this model. ACDC_N substitutes the codebook (Gaussian mixture model) employed by CDCN for the state distributions employed by the recognizer´s HMMs under the assumption that these HMM distributions will model the associated speech segments better than the general GMM distribution. The feature-frame to HMM-state association is obtained through an alignment of a first decoding-pass hypothesis. From this alignment, ACDC_N obtains an estimate of the environmental parameters (noise and channel vectors) which are then employed to obtain an MMSE estimate of the clean speech vectors, in a way similar to Aero´s method. ACDC_N produces an overall reduction of the error rate of over 30 % in the noise range of 0 to 20 dB on experiments conducted on the Aurora-2 noisy digits database.

Keywords :

cepstral analysis; data compression; hidden Markov models; least mean squares methods; noise; speech coding; speech recognition; statistical analysis; Aurora-2 noisy digits database; GMM distribution; Gaussian mixture model; HMM distributions; HMM-state association; MLE; MMSE; acoustic modeling; acoustical mismatch; alignment-based codeword-dependent cepstral normalization; cepstral normalization algorithm; channel vectors; clean speech vectors; decoding-pass hypothesis; environmental conditions; environmental parameters; error rate reduction; feature-frame; linear channel model; maximum likelihood estimation; noise vectors; speech segments model; state distributions; training data; Cepstral analysis; Decoding; Error analysis; Face recognition; Hidden Markov models; Noise reduction; Speech enhancement; Speech recognition; Training data; Working environment noise;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/TSA.2002.804305

Filename :

1045277

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=857188