DocumentCode :
2018258
Title :
Topic-weak-correlated Latent Dirichlet allocation
Author :
Tan, Yimin ; Ou, Zhijian
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear :
2010
fDate :
Nov. 29 2010-Dec. 3 2010
Firstpage :
224
Lastpage :
228
Abstract :
Latent Dirichlet allocation (LDA) has been widely used for analyzing large text corpora. In this paper we propose the topic-weak-correlated LDA (TWC-LDA) for topic modeling, which constrains different topics to be weak-correlated. This is technically achieved by placing a special prior over the topic-word distributions. Reducing the overlapping between the topic-word distributions makes the learned topics more interpretable in the sense that each topic word-distribution can be clearly associated to a distinctive semantic meaning. Experimental results on both synthetic and real-world corpus show the superiority of the TWC-LDA over the basic LDA for semantically meaningful topic discovery and document classification.
Keywords :
data mining; document handling; text analysis; word processing; document classification; latent Dirichlet allocation; text corpora; topic discovery; topic weak correlated LDA; topic word distribution; Accuracy; Adaptation model; Computational modeling; Correlation; Neodymium; Semantics; Vocabulary; topic modeling; weak-correlated topics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
Type :
conf
DOI :
10.1109/ISCSLP.2010.5684906
Filename :
5684906
Link To Document :
بازگشت