• DocumentCode
    228495
  • Title

    Scalable distributed first story detection using storm for twitter data

  • Author

    Huddar, Mahesh G. ; Ramannavar, Manjula M. ; Sidnal, Nandini S.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., SJPN Trust´s Hirasugar Inst. of Technol., Nidasoshi, India
  • fYear
    2014
  • fDate
    1-2 Aug. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Twitter is an online service that enables users to read and post tweets; thereby providing a wealth of information regarding breaking news stories. The problem of First Story Detection is to identify first stories about different events from streaming documents. The Locality sensitive hashing algorithm is the traditional approach used for First Story Detection. The documents have a high degree of lexical variation which makes First Story Detection a very difficult task. This work uses Twitter as the data source to address the problem of real-time First Story Detection. As twitter data contains a lot of spam, we built a dictionary of words to remove spam from the tweets. Further since the Twitter streaming data rate is high, we cannot use traditional Locality sensitive hashing algorithm to detect the first stories. We modify the Locality sensitive hashing algorithm to overcome this limitation while maintaining reasonable accuracy with improved performance. Also, we use Storm distributed platform, so that the system benefits from the robustness, scalability and efficiency that this framework offers.
  • Keywords
    file organisation; social networking (online); text analysis; unsolicited e-mail; Storm distributed platform; Twitter streaming data rate; breaking news stories; data source; document streaming; efficiency improvement; lexical variation degree; locality sensitive hashing algorithm; online service; performance improvement; robustness improvement; scalability improvement; scalable distributed first-story detection problem; spam removal; tweets; word dictionary; Conferences; Fasteners; Parallel processing; Storms; Topology; Twitter; Vectors; Distributed platform; Efficiency; First Story Detection (FSD); Lexical variation; Robustness; Scalability; Storm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Engineering and Technology Research (ICAETR), 2014 International Conference on
  • Conference_Location
    Unnao
  • ISSN
    2347-9337
  • Type

    conf

  • DOI
    10.1109/ICAETR.2014.7012915
  • Filename
    7012915