• DocumentCode
    245122
  • Title

    Hashtag Graph Based Topic Model for Tweet Mining

  • Author

    Yuan Wang ; Jie Liu ; Jishi Qu ; Yalou Huang ; Jimeng Chen ; Xia Feng

  • Author_Institution
    Coll. of Comput. & Control Eng., Nankai Univ., Tianjin, China
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    1025
  • Lastpage
    1030
  • Abstract
    Mining topics in Twitter is increasingly attracting more attention. However, the shortness and informality of tweets leads to extreme sparse vector representation with a large vocabulary, which makes the conventional topic models (e.g., Latent Dirichlet Allocation) often fail to achieve high quality underlying topics. Luckily, tweets always show up with rich user-generated hash tags as keywords. In this paper, we propose a novel topic model to handle such semi-structured tweets, denoted as Hash tag Graph based Topic Model (HGTM). By utilizing relation information between hash tags in our hash tag graph, HGTM establishes word semantic relations, even if they haven´t co-occurred within a specific tweet. In addition, we enhance the dependencies of both multiple words and hash tags via latent variables (topics) modeled by HGTM. We illustrate that the user-contributed hash tags could serve as weakly-supervised information for topic modeling, and hash tag relation could reveal the semantic relation between tweets. Experiments on a real-world twitter data set show that our model provides an effective solution to discover more distinct and coherent topics than the state-of-the-art baselines and has a strong ability to control sparseness and noise in tweets.
  • Keywords
    data mining; social networking (online); HGTM; extreme sparse vector representation; hash tag graph based topic model; hash tag relation; keywords; latent variables; semistructured tweets; topic modeling; topics mining; tweet mining; twitter data set; user-generated hash tags; weakly supervised information; word semantic relations; Analytical models; Data models; Educational institutions; Mathematical model; Semantics; Twitter; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.60
  • Filename
    7023441