• DocumentCode
    2261165
  • Title

    A feature-enhanced smoothing method for LDA model applied to text classification

  • Author

    Liu, Dongxin ; Xu, Weiran ; Hu, Jiani

  • Author_Institution
    PRIS Lab., BUPT, Beijing, China
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Latent Dirichlet Allocation (LDA) is a generative model employing the symmetry Dirichlet distribution as prior of the topic-words´ distributions to implement model smoothing. When LDA is applied to text classification, smoothing is essential to classification performance. In this paper, we propose a feature-enhanced smoothing method in the idea that words not appeared in the training corpus can help to improve the classification performance. The key point is fully considering the relativity between the new document and training corpus, and enhancing the document´s class feature by regarding the words not appeared in the training corpus. Evaluations on 20newsgroups show feature-enhanced smoothing can significantly improve the performance in Bi-class text classification.
  • Keywords
    pattern classification; text analysis; Dirichlet distribution; LDA model; classification performance; feature-enhanced smoothing method; latent Dirichlet allocation; model smoothing; text classification; topic-words distributions; training corpus; Information entropy; Information retrieval; Laplace equations; Linear discriminant analysis; Maximum likelihood estimation; Probability distribution; Random variables; Smoothing methods; Text categorization; Vocabulary; Data-Driven Strategy; Latent Dirichlet Allocation; Text classification; feature-enhanced; smoothing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313846
  • Filename
    5313846