Title :
A feature-enhanced smoothing method for LDA model applied to text classification
Author :
Liu, Dongxin ; Xu, Weiran ; Hu, Jiani
Author_Institution :
PRIS Lab., BUPT, Beijing, China
Abstract :
Latent Dirichlet Allocation (LDA) is a generative model employing the symmetry Dirichlet distribution as prior of the topic-words´ distributions to implement model smoothing. When LDA is applied to text classification, smoothing is essential to classification performance. In this paper, we propose a feature-enhanced smoothing method in the idea that words not appeared in the training corpus can help to improve the classification performance. The key point is fully considering the relativity between the new document and training corpus, and enhancing the document´s class feature by regarding the words not appeared in the training corpus. Evaluations on 20newsgroups show feature-enhanced smoothing can significantly improve the performance in Bi-class text classification.
Keywords :
pattern classification; text analysis; Dirichlet distribution; LDA model; classification performance; feature-enhanced smoothing method; latent Dirichlet allocation; model smoothing; text classification; topic-words distributions; training corpus; Information entropy; Information retrieval; Laplace equations; Linear discriminant analysis; Maximum likelihood estimation; Probability distribution; Random variables; Smoothing methods; Text categorization; Vocabulary; Data-Driven Strategy; Latent Dirichlet Allocation; Text classification; feature-enhanced; smoothing;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
DOI :
10.1109/NLPKE.2009.5313846