DocumentCode :
1848760
Title :
LDA-based LM adaptation using latent semantic marginals and minimum discriminant information
Author :
Haidar, Md Akmal ; O´Shaughnessy, Douglas
Author_Institution :
INRS-EMT, Montreal, QC, Canada
fYear :
2012
fDate :
27-31 Aug. 2012
Firstpage :
2040
Lastpage :
2044
Abstract :
We introduce an unsupervised language model (LM) adaptation approach using latent Dirichlet allocation (LDA) and latent semantic marginals (LSM). LSM is a unigram probability distribution over words and is estimated using the LDA model. A hard-clustering method is used to form topics. Each document is assigned to a topic based on the maximum number of words chosen from the topic for that document in LDA analysis. An LDA-adapted model is created using the weighted combination of topic models. The LDA-adapted model is modified by using the LSM as dynamic marginals, and a new adapted model is formed by using the minimum discriminant information (MDI) approach, which minimizes the distance between the new adapted model and the LDA-adapted model. We apply LM adaptation approaches for original and automatic (recognition results after first-pass decoding) transcriptions test data and have seen significant perplexity and word error rate (WER) reductions over a traditional approach.
Keywords :
probability; speech recognition; unsupervised learning; LDA based LM adaptation; LM; MDI; WER; latent semantic marginals; minimum discriminant information; speech recognition; unigram probability distribution; unsupervised language model; word error rate; Adaptation models; Computational modeling; Decoding; Equations; Mathematical model; Semantics; Training; Latent Dirichlet allocation; latent semantic marginals; speech recognition; unsupervised LM adaptation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European
Conference_Location :
Bucharest
ISSN :
2219-5491
Print_ISBN :
978-1-4673-1068-0
Type :
conf
Filename :
6333925
Link To Document :
بازگشت