DocumentCode
857188
Title
Alignment-based codeword-dependent cepstral normalization
Author
Huerta, Juan Manuel
Author_Institution
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume
10
Issue
7
fYear
2002
fDate
10/1/2002 12:00:00 AM
Firstpage
451
Lastpage
459
Abstract
This paper proposes the alignment-based codeword dependent cepstral normalization algorithm (ACDCN) which aims to alleviate the acoustical mismatch that occurs when the speech recognizer faces environmental conditions not observed in the training data. ACDCN is based on the linear channel model of the environment originally proposed by Acero (1990) and on the CDCN solution to this model. ACDCN substitutes the codebook (Gaussian mixture model) employed by CDCN for the state distributions employed by the recognizer´s HMMs under the assumption that these HMM distributions will model the associated speech segments better than the general GMM distribution. The feature-frame to HMM-state association is obtained through an alignment of a first decoding-pass hypothesis. From this alignment, ACDCN obtains an estimate of the environmental parameters (noise and channel vectors) which are then employed to obtain an MMSE estimate of the clean speech vectors, in a way similar to Aero´s method. ACDCN produces an overall reduction of the error rate of over 30 % in the noise range of 0 to 20 dB on experiments conducted on the Aurora-2 noisy digits database.
Keywords
cepstral analysis; data compression; hidden Markov models; least mean squares methods; noise; speech coding; speech recognition; statistical analysis; Aurora-2 noisy digits database; GMM distribution; Gaussian mixture model; HMM distributions; HMM-state association; MLE; MMSE; acoustic modeling; acoustical mismatch; alignment-based codeword-dependent cepstral normalization; cepstral normalization algorithm; channel vectors; clean speech vectors; decoding-pass hypothesis; environmental conditions; environmental parameters; error rate reduction; feature-frame; linear channel model; maximum likelihood estimation; noise vectors; speech segments model; state distributions; training data; Cepstral analysis; Decoding; Error analysis; Face recognition; Hidden Markov models; Noise reduction; Speech enhancement; Speech recognition; Training data; Working environment noise;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/TSA.2002.804305
Filename
1045277
Link To Document