DocumentCode
1252107
Title
A frequency warping approach to speaker normalization
Author
Lee, Li ; Rose, Richard
Author_Institution
MIT, Cambridge, MA, USA
Volume
6
Issue
1
fYear
1998
fDate
1/1/1998 12:00:00 AM
Firstpage
49
Lastpage
60
Abstract
In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other well-known techniques for reducing variability is described. The results have shown that frequency warping is consistently able to reduce word error rate by 20% even for very short utterances
Keywords
band-pass filters; cepstral analysis; filtering theory; frequency estimation; hidden Markov models; maximum likelihood estimation; speech processing; speech recognition; continuous speech recognition; experiment; filterbank; hidden Markov model; linear frequency warping factor estimation; low complexity frequency warping; maximum likelihood based frequency warping; mel-frequency cepstrum feature analysis; speaker normalization; speech recognition performance; telephone based connected digit recognition; very short utterances; vocal tract shape; word error rate reduction; Cepstral analysis; Cepstrum; Degradation; Error analysis; Filter bank; Frequency estimation; Maximum likelihood estimation; Shape; Speech recognition; Telephony;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/89.650310
Filename
650310
Link To Document