DocumentCode :
2261165
Title :
A feature-enhanced smoothing method for LDA model applied to text classification
Author :
Liu, Dongxin ; Xu, Weiran ; Hu, Jiani
Author_Institution :
PRIS Lab., BUPT, Beijing, China
fYear :
2009
fDate :
24-27 Sept. 2009
Firstpage :
1
Lastpage :
7
Abstract :
Latent Dirichlet Allocation (LDA) is a generative model employing the symmetry Dirichlet distribution as prior of the topic-words´ distributions to implement model smoothing. When LDA is applied to text classification, smoothing is essential to classification performance. In this paper, we propose a feature-enhanced smoothing method in the idea that words not appeared in the training corpus can help to improve the classification performance. The key point is fully considering the relativity between the new document and training corpus, and enhancing the document´s class feature by regarding the words not appeared in the training corpus. Evaluations on 20newsgroups show feature-enhanced smoothing can significantly improve the performance in Bi-class text classification.
Keywords :
pattern classification; text analysis; Dirichlet distribution; LDA model; classification performance; feature-enhanced smoothing method; latent Dirichlet allocation; model smoothing; text classification; topic-words distributions; training corpus; Information entropy; Information retrieval; Laplace equations; Linear discriminant analysis; Maximum likelihood estimation; Probability distribution; Random variables; Smoothing methods; Text categorization; Vocabulary; Data-Driven Strategy; Latent Dirichlet Allocation; Text classification; feature-enhanced; smoothing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-4538-7
Electronic_ISBN :
978-1-4244-4540-0
Type :
conf
DOI :
10.1109/NLPKE.2009.5313846
Filename :
5313846
Link To Document :
بازگشت