DocumentCode
2018258
Title
Topic-weak-correlated Latent Dirichlet allocation
Author
Tan, Yimin ; Ou, Zhijian
Author_Institution
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
fYear
2010
fDate
Nov. 29 2010-Dec. 3 2010
Firstpage
224
Lastpage
228
Abstract
Latent Dirichlet allocation (LDA) has been widely used for analyzing large text corpora. In this paper we propose the topic-weak-correlated LDA (TWC-LDA) for topic modeling, which constrains different topics to be weak-correlated. This is technically achieved by placing a special prior over the topic-word distributions. Reducing the overlapping between the topic-word distributions makes the learned topics more interpretable in the sense that each topic word-distribution can be clearly associated to a distinctive semantic meaning. Experimental results on both synthetic and real-world corpus show the superiority of the TWC-LDA over the basic LDA for semantically meaningful topic discovery and document classification.
Keywords
data mining; document handling; text analysis; word processing; document classification; latent Dirichlet allocation; text corpora; topic discovery; topic weak correlated LDA; topic word distribution; Accuracy; Adaptation model; Computational modeling; Correlation; Neodymium; Semantics; Vocabulary; topic modeling; weak-correlated topics;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location
Tainan
Print_ISBN
978-1-4244-6244-5
Type
conf
DOI
10.1109/ISCSLP.2010.5684906
Filename
5684906
Link To Document