DocumentCode
2261165
Title
A feature-enhanced smoothing method for LDA model applied to text classification
Author
Liu, Dongxin ; Xu, Weiran ; Hu, Jiani
Author_Institution
PRIS Lab., BUPT, Beijing, China
fYear
2009
fDate
24-27 Sept. 2009
Firstpage
1
Lastpage
7
Abstract
Latent Dirichlet Allocation (LDA) is a generative model employing the symmetry Dirichlet distribution as prior of the topic-words´ distributions to implement model smoothing. When LDA is applied to text classification, smoothing is essential to classification performance. In this paper, we propose a feature-enhanced smoothing method in the idea that words not appeared in the training corpus can help to improve the classification performance. The key point is fully considering the relativity between the new document and training corpus, and enhancing the document´s class feature by regarding the words not appeared in the training corpus. Evaluations on 20newsgroups show feature-enhanced smoothing can significantly improve the performance in Bi-class text classification.
Keywords
pattern classification; text analysis; Dirichlet distribution; LDA model; classification performance; feature-enhanced smoothing method; latent Dirichlet allocation; model smoothing; text classification; topic-words distributions; training corpus; Information entropy; Information retrieval; Laplace equations; Linear discriminant analysis; Maximum likelihood estimation; Probability distribution; Random variables; Smoothing methods; Text categorization; Vocabulary; Data-Driven Strategy; Latent Dirichlet Allocation; Text classification; feature-enhanced; smoothing;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location
Dalian
Print_ISBN
978-1-4244-4538-7
Electronic_ISBN
978-1-4244-4540-0
Type
conf
DOI
10.1109/NLPKE.2009.5313846
Filename
5313846
Link To Document