DocumentCode :
2066069
Title :
PLSA Based Topic Mixture Language Modeling Approach
Author :
Bai, Shuanhu ; Li, Haizhou
Author_Institution :
Inst. for Infocomm Res., Singapore, Singapore
fYear :
2008
fDate :
16-19 Dec. 2008
Firstpage :
1
Lastpage :
4
Abstract :
In this paper, we propose a method to extend the use of latent topics into higher order n-gram models. In training, the parameters of higher order n-gram models are estimated using discounted average counts derived from the application of probabilistic latent semantic analysis(PLSA) models on n-gram counts in training corpus. In decoding, a simple yet efficient topic prediction method is introduced to predict its topic given a new document. The proposed topic mixture language model (TMLM) displays two advantages over previous methods: 1) having the ability of building topic mixture n-gram LM (n>1) and, 2) without requiring a big general baseline LM. The experimental results show that TMLMs, even using smaller number of topics, outperform LMs implemented using both standard n-gram approach and unsupervised adaptation approaches in terms of perplexity reductions.
Keywords :
learning (artificial intelligence); natural language processing; higher order n-gram models; probabilistic latent semantic analysis models; topic mixture language modeling; topic prediction method; training corpus; Algorithm design and analysis; Bayesian methods; Clustering algorithms; Decoding; Displays; Error analysis; Prediction methods; Singular value decomposition; Testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2942-4
Electronic_ISBN :
978-1-4244-2943-1
Type :
conf
DOI :
10.1109/CHINSL.2008.ECP.58
Filename :
4730312
Link To Document :
بازگشت