DocumentCode :
900308
Title :
Minimum phone error training of precision matrix models
Author :
Sim, Khe Chai ; Gales, Mark J F
Author_Institution :
Eng. Dept., Cambridge Univ., UK
Volume :
14
Issue :
3
fYear :
2006
fDate :
5/1/2006 12:00:00 AM
Firstpage :
882
Lastpage :
889
Abstract :
Gaussian mixture models (GMMs) are commonly used as the output density function for large-vocabulary continuous speech recognition (LVCSR) systems. A standard problem when using multivariate GMMs to classify data is how to accurately represent the correlations in the feature vector. Full covariance matrices yield a good model, but dramatically increase the number of model parameters. Hence, diagonal covariance matrices are commonly used. Structured precision matrix approximations provide an alternative, flexible, and compact representation. Schemes in this category include the extended maximum likelihood linear transform and subspace for precision and mean models. This paper examines how these precision matrix models can be discriminatively trained and used on state-of-the-art speech recognition tasks. In particular, the use of the minimum phone error criterion is investigated. Implementation issues associated with building LVCSR systems are also addressed. These models are evaluated and compared using large vocabulary continuous telephone speech and broadcast news English tasks.
Keywords :
Gaussian processes; matrix algebra; maximum likelihood estimation; speech recognition; transforms; Gaussian mixture models; covariance matrices; extended maximum likelihood linear transform; large-vocabulary continuous speech recognition; phone error training; precision matrix models; Associate members; Covariance matrix; Decorrelation; Density functional theory; Hidden Markov models; Linear discriminant analysis; Maximum likelihood estimation; Speech recognition; Unsolicited electronic mail; Vocabulary; Discriminative training; large-vocabulary continuous speech recognition (LVCSR); minimum phone error; precision matrix modeling;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TSA.2005.858062
Filename :
1621201
Link To Document :
بازگشت