• DocumentCode
    2717730
  • Title

    An Efficient Clustering Algorithm for Microblogging Hot Topic Detection

  • Author

    Tu, Hao ; Ding, Jin

  • Author_Institution
    Network & Comput. Center, Huazhong Univ. of Sci. & Tech., Wuhan, China
  • fYear
    2012
  • fDate
    11-13 Aug. 2012
  • Firstpage
    738
  • Lastpage
    741
  • Abstract
    Microblog has become exceeding popular, with hundreds of millions of tweets being posted every minute on variety of topics. Most hot event will be retweeted thousands of times in short time, which will help us to trace hot event. This paper focuses on tracing those events by mining the text stream in microblog. Although event detection has long been a research topic, the characteristics of microblog bring new challenge. Tweets reporting such events are usually overwhelmed by high flood of meaningless tweets, algorithm needs to be scalable given the sheer amount of tweets. Firstly, we use Bayes classification to filter the meaningless tweets, then detect hot event from the tweets by a mean calculation based incomplete clustering. The experiments show that algorithm can detect hot events real-time from big amount tweets and remain good accuracy.
  • Keywords
    Bayes methods; data mining; pattern clustering; social networking (online); text analysis; Bayes classification; Tweets; efficient clustering algorithm; event detection; mean calculation based incomplete clustering; microblogging hot topic detection; text stream mining; Accuracy; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Event detection; Filtering algorithms; Twitter; clustering algorithm; microblog; topic detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Service System (CSSS), 2012 International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4673-0721-5
  • Type

    conf

  • DOI
    10.1109/CSSS.2012.189
  • Filename
    6394427