DocumentCode :
1161223
Title :
Discriminative cluster adaptive training
Author :
Yu, Kai ; Gales, Mark J F
Author_Institution :
Eng. Dept., Cambridge Univ.
Volume :
14
Issue :
5
fYear :
2006
Firstpage :
1694
Lastpage :
1703
Abstract :
Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker and environment adaptation. Interpolation weights are used to transform a multiple-cluster, canonical, model to a standard hidden Markov model (HMM) set representative of an individual speaker or acoustic environment. Maximum likelihood training for CAT has previously been investigated. However, in state-of-the-art large vocabulary continuous speech recognition systems, discriminative training is commonly employed. This paper investigates applying discriminative training to multiple-cluster systems. In particular, minimum phone error (MPE) update formulae for CAT systems are derived. In order to use MPE in this case, modifications to the standard MPE smoothing function and the prior distribution associated with MPE training are required. A more complex adaptive training scheme combining both interpolation weights and linear transforms, a structured transform (ST), is also discussed within the MPE training framework. Discriminatively trained CAT and ST systems were evaluated on a state-of-the-art conversational telephone speech task. These multiple-cluster systems were found to outperform both standard and adaptively trained systems
Keywords :
eigenvalues and eigenfunctions; hidden Markov models; interpolation; pattern clustering; speech recognition; statistical analysis; transforms; continuous speech recognition systems; discriminative cluster adaptive training; eigenvoice systems; hidden Markov model; interpolation weights; linear transforms; maximum likelihood training; minimum phone error update formulae; multiple-cluster schemes; smoothing function; state-of-the-art conversational telephone speech task; state-of-the-art large vocabulary; structured transform; Error correction; Hidden Markov models; Interpolation; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Smoothing methods; Speech recognition; Telephony; Vocabulary; Cluster adaptive training (CAT); discriminative training; eigenvoices; minimum phone error (MPE); multiple-cluster HMM;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TSA.2005.858555
Filename :
1677989
Link To Document :
بازگشت