Title :
Topic Detection in Twitter Based on Label Propagation Model
Author :
Dongxu Huang ; Dejun Mu
Author_Institution :
Sch. of Autom., Northwest Polytech. Univ., Xi´an, China
Abstract :
Many kinds of huge amount of tweets about real-world events are generated everyday in Twitter. However, the disorganization messages required to be classified by topics and events are one of challenges to get knowledge effectively. To solve the problem, we propose a novel method that combines the cluster algorithm with label propagation algorithm to detect topics in twitter. First, we use canopy cluster algorithm to cluster tweets, canopy cluster algorithm could divides a tweet into different clusters, and the tweet which only belongs to one cluster will be labeled. Second, the mechanism of label propagation is used to label the tweets that in the overlapping of different clusters. In order to evaluate our algorithm, we use two baseline algorithms, LDA (Latent Dirichlet Allocation) and Single-Pass cluster algorithm. We apply three algorithms on tweet dataset with three topics and some noisy data, and experiment results show our method outperforms other algorithms on precision and recall rate.
Keywords :
information retrieval; pattern clustering; social networking (online); LDA algorithm; Twitter; canopy cluster algorithm; disorganization messages; label propagation model; latent Dirichlet allocation; precision rate; recall rate; single-pass cluster algorithm; topic detection; tweet dataset; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Computational modeling; Time complexity; Twitter; Vectors; cluster algorithm; label propagation model; topic detection; twitter;
Conference_Titel :
Distributed Computing and Applications to Business, Engineering and Science (DCABES), 2014 13th International Symposium on
Conference_Location :
Xian Ning
Print_ISBN :
978-1-4799-4170-4
DOI :
10.1109/DCABES.2014.23