DocumentCode :
510217
Title :
Topic Discovery Based on LDA Model with Fast Gibbs Sampling
Author :
Jing, Shi ; Wanlong, Li
Author_Institution :
Coll. of Comput. Sci. & Eng., Changchun Univ. of Technol., Changchun, China
Volume :
3
fYear :
2009
fDate :
7-8 Nov. 2009
Firstpage :
91
Lastpage :
95
Abstract :
Topic discovery described here is used to determine the topic that a document or a segment discusses. It is very important for some applications of natural language processing (NLP), such as information retrieval/extraction, summarization and topic analysis etc. The paper extracts topic words based on Shannon information, in which latent Dirichlet allocation (LDA) is employed to represent word distribution. The estimation of the parameters is speeded up by fast Gibbs sampling. Words which do not appear in the analyzed document can be inferred as topic with the help of word clustering of background. Topics are represented by means of word groups. The experiment results show that our approach performs far better than other methods.
Keywords :
information theory; pattern clustering; sampling methods; text analysis; LDA model; Shannon information; fast Gibbs sampling; latent Dirichlet allocation; parameter estimation; topic discovery; topic word extraction; word clustering; word distribution; word group; Application software; Artificial intelligence; Computational intelligence; Computer science; Data mining; Educational institutions; Linear discriminant analysis; Natural language processing; Parameter estimation; Sampling methods; inside-outside algorithm; semantic analysis; unsupervised model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Artificial Intelligence and Computational Intelligence, 2009. AICI '09. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-3835-8
Electronic_ISBN :
978-0-7695-3816-7
Type :
conf
DOI :
10.1109/AICI.2009.225
Filename :
5376542
Link To Document :
بازگشت