• DocumentCode
    11803
  • Title

    On Summarization and Timeline Generation for Evolutionary Tweet Streams

  • Author

    Zhenhua Wang ; Lidan Shou ; Ke Chen ; Gang Chen ; Mehrotra, Sharad

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
  • Volume
    27
  • Issue
    5
  • fYear
    2015
  • fDate
    May 1 2015
  • Firstpage
    1301
  • Lastpage
    1315
  • Abstract
    Short-text messages such as tweets are being created and shared at an unprecedented rate. Tweets, in their raw form, while being informative, can also be overwhelming. For both end-users and data analysts, it is a nightmare to plow through millions of tweets which contain enormous amount of noise and redundancy. In this paper, we propose a novel continuous summarization framework called Sumblr to alleviate the problem. In contrast to the traditional document summarization methods which focus on static and small-scale data set, Sumblr is designed to deal with dynamic, fast arriving, and large-scale tweet streams. Our proposed framework consists of three major components. First, we propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics in a data structure called tweet cluster vector (TCV). Second, we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Third, we design an effective topic evolution detection method, which monitors summary-based/volume-based variations to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our framework.
  • Keywords
    data structures; document handling; social networking (online); statistics; vectors; Sumblr; TCV-Rank summarization; continuous summarization framework; data structure; distilled statistics; document summarization; evolutionary tweet streams; short-text messages; timeline generation; tweet cluster vector; Algorithm design and analysis; Clustering algorithms; Context; Data structures; Monitoring; Twitter; Vectors; Tweet stream; continuous summarization; summary; timeline;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2345379
  • Filename
    6871372