DocumentCode :
857188
Title :
Alignment-based codeword-dependent cepstral normalization
Author :
Huerta, Juan Manuel
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
10
Issue :
7
fYear :
2002
fDate :
10/1/2002 12:00:00 AM
Firstpage :
451
Lastpage :
459
Abstract :
This paper proposes the alignment-based codeword dependent cepstral normalization algorithm (ACDCN) which aims to alleviate the acoustical mismatch that occurs when the speech recognizer faces environmental conditions not observed in the training data. ACDCN is based on the linear channel model of the environment originally proposed by Acero (1990) and on the CDCN solution to this model. ACDCN substitutes the codebook (Gaussian mixture model) employed by CDCN for the state distributions employed by the recognizer´s HMMs under the assumption that these HMM distributions will model the associated speech segments better than the general GMM distribution. The feature-frame to HMM-state association is obtained through an alignment of a first decoding-pass hypothesis. From this alignment, ACDCN obtains an estimate of the environmental parameters (noise and channel vectors) which are then employed to obtain an MMSE estimate of the clean speech vectors, in a way similar to Aero´s method. ACDCN produces an overall reduction of the error rate of over 30 % in the noise range of 0 to 20 dB on experiments conducted on the Aurora-2 noisy digits database.
Keywords :
cepstral analysis; data compression; hidden Markov models; least mean squares methods; noise; speech coding; speech recognition; statistical analysis; Aurora-2 noisy digits database; GMM distribution; Gaussian mixture model; HMM distributions; HMM-state association; MLE; MMSE; acoustic modeling; acoustical mismatch; alignment-based codeword-dependent cepstral normalization; cepstral normalization algorithm; channel vectors; clean speech vectors; decoding-pass hypothesis; environmental conditions; environmental parameters; error rate reduction; feature-frame; linear channel model; maximum likelihood estimation; noise vectors; speech segments model; state distributions; training data; Cepstral analysis; Decoding; Error analysis; Face recognition; Hidden Markov models; Noise reduction; Speech enhancement; Speech recognition; Training data; Working environment noise;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/TSA.2002.804305
Filename :
1045277
Link To Document :
بازگشت