Discriminative cluster adaptive training

Author

Yu, Kai ; Gales, Mark J F

Author_Institution

Eng. Dept., Cambridge Univ.

Volume

14

Issue

5

fYear

2006

Firstpage

1694

Lastpage

1703

Abstract

Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker and environment adaptation. Interpolation weights are used to transform a multiple-cluster, canonical, model to a standard hidden Markov model (HMM) set representative of an individual speaker or acoustic environment. Maximum likelihood training for CAT has previously been investigated. However, in state-of-the-art large vocabulary continuous speech recognition systems, discriminative training is commonly employed. This paper investigates applying discriminative training to multiple-cluster systems. In particular, minimum phone error (MPE) update formulae for CAT systems are derived. In order to use MPE in this case, modifications to the standard MPE smoothing function and the prior distribution associated with MPE training are required. A more complex adaptive training scheme combining both interpolation weights and linear transforms, a structured transform (ST), is also discussed within the MPE training framework. Discriminatively trained CAT and ST systems were evaluated on a state-of-the-art conversational telephone speech task. These multiple-cluster systems were found to outperform both standard and adaptively trained systems

Keywords

eigenvalues and eigenfunctions; hidden Markov models; interpolation; pattern clustering; speech recognition; statistical analysis; transforms; continuous speech recognition systems; discriminative cluster adaptive training; eigenvoice systems; hidden Markov model; interpolation weights; linear transforms; maximum likelihood training; minimum phone error update formulae; multiple-cluster schemes; smoothing function; state-of-the-art conversational telephone speech task; state-of-the-art large vocabulary; structured transform; Error correction; Hidden Markov models; Interpolation; Loudspeakers; Maximum likelihood estimation; Maximum likelihood linear regression; Smoothing methods; Speech recognition; Telephony; Vocabulary; Cluster adaptive training (CAT); discriminative training; eigenvoices; minimum phone error (MPE); multiple-cluster HMM;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TSA.2005.858555

Filename

1677989