DocumentCode :
145131
Title :
An improved LDA algorithm for text classification
Author :
Dexin Zhao ; Jinqun He ; Jin Liu
Author_Institution :
Tianjin Key Lab. of Intell. Comput. & Novel Software Technol., Tianjin Univ. of Technol., Tianjin, China
Volume :
1
fYear :
2014
fDate :
26-28 April 2014
Firstpage :
217
Lastpage :
221
Abstract :
Latent Dirichlet Allocation is a classic topic model which can extract latent topic from large data corpus. This model assumes that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. In this paper, we present an algorithm called gLDA for topic text classification by adding topic-category distribution parameter to LDA, which can make the document generated from the most relevant category. Gibbs sampling is employed to conduct approximate inference, and experiment results in two datasets show the effectiveness of this method.
Keywords :
pattern classification; sampling methods; text analysis; Gibbs sampling; LDA algorithm; approximate inference; data corpus; gLDA; latent Dirichlet allocation; topic text classification; topic-category distribution parameter; Accuracy; Data models; Predictive models; Resource management; Text categorization; Training; Training data; LDA; text classification; topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science, Electronics and Electrical Engineering (ISEEE), 2014 International Conference on
Conference_Location :
Sapporo
Print_ISBN :
978-1-4799-3196-5
Type :
conf
DOI :
10.1109/InfoSEEE.2014.6948100
Filename :
6948100
Link To Document :
بازگشت