DocumentCode :
2210183
Title :
Topic Modeling Ensembles
Author :
Shen, Zhiyong ; Luo, Ping ; Yang, Shengwen ; Shen, Xukun
Author_Institution :
Hewlett Packard Labs. China, China
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
1031
Lastpage :
1036
Abstract :
In this paper we propose a framework of topic modeling ensembles, a novel solution to combine the models learned by topic modeling over each partition of the whole corpus. It has the potentials for applications such as distributed topic modeling for large corpora, and incremental topic modeling for rapidly growing corpora. Since only the base models, not the original documents, are required in the ensemble, all these applications can be performed in a privacy preserving manner. We explore the theoretical foundation of the proposed framework, give its geometric interpretation, and implement it for both PLSA and LDA. The evaluation of the implementations over the synthetic and real-life data sets shows that the proposed framework is much more efficient than modeling the original corpus directly while achieves comparable effectiveness in terms of perplexity and classification accuracy.
Keywords :
data privacy; document handling; learning (artificial intelligence); LDA; PLSA; distributed topic modeling; incremental topic modeling; latent Dirichlet allocation; privacy preserving manner; probabilistic latent semantic analysis; topic modeling ensemble; Ensemble; Topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.113
Filename :
5694080
Link To Document :
بازگشت