DocumentCode :
188236
Title :
Refining LDA Results and Ranking Topics in Order of Quantity and Quality with an Application to Twitter Streaming Data
Author :
Fujino, Iwao
Author_Institution :
Sch. of Inf. & Telecommun. Eng., Tokai Univ., Tokyo, Japan
fYear :
2014
fDate :
13-15 Oct. 2014
Firstpage :
209
Lastpage :
216
Abstract :
Topic model is an emerging approach to summarize data, especially text data, in terms of a small set of latent variables. The most useful implement of topic model is LDA method, which is an unsupervised machine learning technique to identify latent topic information from a massive document collection. However, sometimes the LDA method gives some hard understanding or meaningless results. In order to improve this problem, in this paper we proposed a method for refining results of LDA and also ranking topics in order of some significance criterion. Our study is based on two basic assumptions. The first assumption is that the correlation coefficient between any two different topics should be zero under ideal condition. The second assumption is that the quality of topics can be defined as a deviation from background topic. Starting from these two assumptions, we provided a concrete method to determine the number of topics when using LDA method to extract topics from documents data and also to ranking the LDA results in order of quality. As a confirmation of our proposed methods, we conducted several experiments to processing Twitter streaming data. The results of these experiments show that our methods work efficiently as expected.
Keywords :
document handling; learning (artificial intelligence); social networking (online); LDA method; Twitter streaming data; documents data topic extraction; massive document collection; ranking topics; topic model; unsupervised machine learning technique; Correlation; Correlation coefficient; Data models; Probability distribution; Refining; Twitter; Vectors; Jensen-Shannon divergence; LDA (Latent Dirichlet Allocation); Twitter; correlation coefficient; topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-6235-8
Type :
conf
DOI :
10.1109/CyberC.2014.45
Filename :
6984308
Link To Document :
بازگشت