• DocumentCode
    2772500
  • Title

    Modeling Syntactic Structures of Topics with a Nested HMM-LDA

  • Author

    Jiang, Jing

  • Author_Institution
    Sch. of Inf. Syst., Singapore Manage. Univ., Singapore, Singapore
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    824
  • Lastpage
    829
  • Abstract
    Latent Dirichlet allocation (LDA) is a commonly used topic modeling method for text analysis and mining. Standard LDA treats documents as bags of words, ignoring the syntactic structures of sentences. In this paper, we propose a hybrid model that embeds hidden Markov models (HMMs) within LDA topics to jointly model both the topics and the syntactic structures within each topic. Our model is general and subsumes standard LDA and HMM as special cases. Compared with standard LDA and HMM, our model can simultaneously discover both topic-specific content words and background functional words shared among topics. Our model can also automatically separate content words that play different roles within a topic. Using perplexity as evaluation metric, our model returns lower perplexity for unseen test documents compared with standard LDA, which shows its better generalization power than LDA.
  • Keywords
    data mining; hidden Markov models; text analysis; background functional words; hidden Markov models; latent Dirichlet allocation; syntactic structure modeling; text analysis; text mining; topic modeling method; topic-specific content words; Conference management; Content management; Data mining; Hidden Markov models; Information management; Information retrieval; Linear discriminant analysis; Management information systems; Text analysis; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.144
  • Filename
    5360318