DocumentCode
228495
Title
Scalable distributed first story detection using storm for twitter data
Author
Huddar, Mahesh G. ; Ramannavar, Manjula M. ; Sidnal, Nandini S.
Author_Institution
Dept. of Comput. Sci. & Eng., SJPN Trust´s Hirasugar Inst. of Technol., Nidasoshi, India
fYear
2014
fDate
1-2 Aug. 2014
Firstpage
1
Lastpage
5
Abstract
Twitter is an online service that enables users to read and post tweets; thereby providing a wealth of information regarding breaking news stories. The problem of First Story Detection is to identify first stories about different events from streaming documents. The Locality sensitive hashing algorithm is the traditional approach used for First Story Detection. The documents have a high degree of lexical variation which makes First Story Detection a very difficult task. This work uses Twitter as the data source to address the problem of real-time First Story Detection. As twitter data contains a lot of spam, we built a dictionary of words to remove spam from the tweets. Further since the Twitter streaming data rate is high, we cannot use traditional Locality sensitive hashing algorithm to detect the first stories. We modify the Locality sensitive hashing algorithm to overcome this limitation while maintaining reasonable accuracy with improved performance. Also, we use Storm distributed platform, so that the system benefits from the robustness, scalability and efficiency that this framework offers.
Keywords
file organisation; social networking (online); text analysis; unsolicited e-mail; Storm distributed platform; Twitter streaming data rate; breaking news stories; data source; document streaming; efficiency improvement; lexical variation degree; locality sensitive hashing algorithm; online service; performance improvement; robustness improvement; scalability improvement; scalable distributed first-story detection problem; spam removal; tweets; word dictionary; Conferences; Fasteners; Parallel processing; Storms; Topology; Twitter; Vectors; Distributed platform; Efficiency; First Story Detection (FSD); Lexical variation; Robustness; Scalability; Storm;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Engineering and Technology Research (ICAETR), 2014 International Conference on
Conference_Location
Unnao
ISSN
2347-9337
Type
conf
DOI
10.1109/ICAETR.2014.7012915
Filename
7012915
Link To Document