DocumentCode :
3455510
Title :
Unsupervised language model adaptation using n-gram weighting
Author :
Haidar, Md Akmal ; O´Shaughnessy, D.
Author_Institution :
INRS-EMT, Montreal, QC, Canada
fYear :
2011
fDate :
8-11 May 2011
Abstract :
In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.
Keywords :
information retrieval; pattern clustering; statistical analysis; text analysis; unsupervised learning; LDA analysis; component topic models; hard clustering method; information retrieval; latent Dirichlet allocation analysis; mixture language model adaptation; mixture weights; n-gram weighting approach; training vocabulary; unsupervised language model adaptation; word error rate; Adaptation models; Computational modeling; Mathematical model; Probabilistic logic; Semantics; Training; Vocabulary; Mixture models; language model adaptation; latent Dirichlet allocation; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering (CCECE), 2011 24th Canadian Conference on
Conference_Location :
Niagara Falls, ON
ISSN :
0840-7789
Print_ISBN :
978-1-4244-9788-1
Electronic_ISBN :
0840-7789
Type :
conf
DOI :
10.1109/CCECE.2011.6030578
Filename :
6030578
Link To Document :
بازگشت