Title :
Speaker adaptation for telephony data using speaker clustering
Author :
Wu, Cheng ; Lubesnky, D. ; Wang, Zhong-Hua
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
This paper reports an ongoing effort to develop an unsupervised on-line speaker adaptation method for the telephony environment. All speakers in the training data corpus are acoustically pre-clustered into clusters, and a cluster-dependent system is built for each cluster. When a new telephony test speaker is given, a cluster, which is the closest to the speaker, is determined and selected by an improved distance measure. Based on this selected cluster, a MLLR (maximum likelihood linear regression) adaptation algorithm with block diagonal transformation is applied to move the cluster model to be closer to the testing speaker. For telephony applications the adaptation data can be very short or noisy, potentially the MLLR adapted means can be unreliable. A MAP-like weighting scheme for MLLR adaptation is applied to ensure the adapted mean reliability when the adaptation data is very short
Keywords :
hidden Markov models; maximum likelihood estimation; pattern clustering; speech recognition; statistical analysis; telephony; MAP-like weighting scheme; MLLR adaptation algorithm; adapted mean reliability; block diagonal transformation; cluster-dependent system; distance measure; maximum likelihood linear regression; pre-clustering; speaker adaptation; speaker clustering; telephony data; unsupervised on-line speaker adaptation method; Acoustic measurements; Acoustic testing; Hidden Markov models; Loudspeakers; Maximum likelihood linear regression; Multiaccess communication; Speech recognition; Telephony; Time division multiple access; Training data;
Conference_Titel :
Signal Processing Proceedings, 2000. WCCC-ICSP 2000. 5th International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7803-5747-7
DOI :
10.1109/ICOSP.2000.891625