Title :
An improved LDA algorithm for text classification
Author :
Dexin Zhao ; Jinqun He ; Jin Liu
Author_Institution :
Tianjin Key Lab. of Intell. Comput. & Novel Software Technol., Tianjin Univ. of Technol., Tianjin, China
Abstract :
Latent Dirichlet Allocation is a classic topic model which can extract latent topic from large data corpus. This model assumes that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. In this paper, we present an algorithm called gLDA for topic text classification by adding topic-category distribution parameter to LDA, which can make the document generated from the most relevant category. Gibbs sampling is employed to conduct approximate inference, and experiment results in two datasets show the effectiveness of this method.
Keywords :
pattern classification; sampling methods; text analysis; Gibbs sampling; LDA algorithm; approximate inference; data corpus; gLDA; latent Dirichlet allocation; topic text classification; topic-category distribution parameter; Accuracy; Data models; Predictive models; Resource management; Text categorization; Training; Training data; LDA; text classification; topic model;
Conference_Titel :
Information Science, Electronics and Electrical Engineering (ISEEE), 2014 International Conference on
Conference_Location :
Sapporo
Print_ISBN :
978-1-4799-3196-5
DOI :
10.1109/InfoSEEE.2014.6948100