DocumentCode :
2711038
Title :
Collective Latent Dirichlet Allocation
Author :
Shen, Zhi-Yong ; Sun, Jun ; Shen, Yi-Dong
Author_Institution :
State Key Lab. of Comput. Sci., Chinese Acad. of Sci., Beijing
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
1019
Lastpage :
1024
Abstract :
In this paper, we propose a new variant of latent Dirichlet allocation (LDA): Collective LDA (C-LDA), for multiple corpora modeling. C-LDA combines multiple corpora during learning such that it can transfer knowledge from one corpus to another; meanwhile it keeps a discriminative node which represents the corpus ID to constrain the learned topics in each corpus. Compared with LDA locally applied to the target corpus, C-LDA results in refined topic-word distribution, while compared with applying LDA globally and straightforwardly to the combined corpus, C-LDA keeps each topic only for one corpus. We demonstrate that C-LDA has improved performance with these advantages by experiments on several benchmark document data sets.
Keywords :
classification; document handling; learning (artificial intelligence); collective latent Dirichlet allocation; document classification; knowledge transfer; machine learning; multiple corpora modeling; topic-word distribution; Computer science; Content based retrieval; Data mining; Information retrieval; Laboratories; Linear discriminant analysis; Machine learning; Natural language processing; Text mining; Web pages; collective LDA;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
Conference_Location :
Pisa
ISSN :
1550-4786
Print_ISBN :
978-0-7695-3502-9
Type :
conf
DOI :
10.1109/ICDM.2008.75
Filename :
4781218
Link To Document :
بازگشت