Title :
Hashtag Graph Based Topic Model for Tweet Mining
Author :
Yuan Wang ; Jie Liu ; Jishi Qu ; Yalou Huang ; Jimeng Chen ; Xia Feng
Author_Institution :
Coll. of Comput. & Control Eng., Nankai Univ., Tianjin, China
Abstract :
Mining topics in Twitter is increasingly attracting more attention. However, the shortness and informality of tweets leads to extreme sparse vector representation with a large vocabulary, which makes the conventional topic models (e.g., Latent Dirichlet Allocation) often fail to achieve high quality underlying topics. Luckily, tweets always show up with rich user-generated hash tags as keywords. In this paper, we propose a novel topic model to handle such semi-structured tweets, denoted as Hash tag Graph based Topic Model (HGTM). By utilizing relation information between hash tags in our hash tag graph, HGTM establishes word semantic relations, even if they haven´t co-occurred within a specific tweet. In addition, we enhance the dependencies of both multiple words and hash tags via latent variables (topics) modeled by HGTM. We illustrate that the user-contributed hash tags could serve as weakly-supervised information for topic modeling, and hash tag relation could reveal the semantic relation between tweets. Experiments on a real-world twitter data set show that our model provides an effective solution to discover more distinct and coherent topics than the state-of-the-art baselines and has a strong ability to control sparseness and noise in tweets.
Keywords :
data mining; social networking (online); HGTM; extreme sparse vector representation; hash tag graph based topic model; hash tag relation; keywords; latent variables; semistructured tweets; topic modeling; topics mining; tweet mining; twitter data set; user-generated hash tags; weakly supervised information; word semantic relations; Analytical models; Data models; Educational institutions; Mathematical model; Semantics; Twitter; Vectors;
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4799-4303-6
DOI :
10.1109/ICDM.2014.60