• DocumentCode
    2018258
  • Title

    Topic-weak-correlated Latent Dirichlet allocation

  • Author

    Tan, Yimin ; Ou, Zhijian

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    224
  • Lastpage
    228
  • Abstract
    Latent Dirichlet allocation (LDA) has been widely used for analyzing large text corpora. In this paper we propose the topic-weak-correlated LDA (TWC-LDA) for topic modeling, which constrains different topics to be weak-correlated. This is technically achieved by placing a special prior over the topic-word distributions. Reducing the overlapping between the topic-word distributions makes the learned topics more interpretable in the sense that each topic word-distribution can be clearly associated to a distinctive semantic meaning. Experimental results on both synthetic and real-world corpus show the superiority of the TWC-LDA over the basic LDA for semantically meaningful topic discovery and document classification.
  • Keywords
    data mining; document handling; text analysis; word processing; document classification; latent Dirichlet allocation; text corpora; topic discovery; topic weak correlated LDA; topic word distribution; Accuracy; Adaptation model; Computational modeling; Correlation; Neodymium; Semantics; Vocabulary; topic modeling; weak-correlated topics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684906
  • Filename
    5684906