Title :
Scalable distributed first story detection using storm for twitter data
Author :
Huddar, Mahesh G. ; Ramannavar, Manjula M. ; Sidnal, Nandini S.
Author_Institution :
Dept. of Comput. Sci. & Eng., SJPN Trust´s Hirasugar Inst. of Technol., Nidasoshi, India
Abstract :
Twitter is an online service that enables users to read and post tweets; thereby providing a wealth of information regarding breaking news stories. The problem of First Story Detection is to identify first stories about different events from streaming documents. The Locality sensitive hashing algorithm is the traditional approach used for First Story Detection. The documents have a high degree of lexical variation which makes First Story Detection a very difficult task. This work uses Twitter as the data source to address the problem of real-time First Story Detection. As twitter data contains a lot of spam, we built a dictionary of words to remove spam from the tweets. Further since the Twitter streaming data rate is high, we cannot use traditional Locality sensitive hashing algorithm to detect the first stories. We modify the Locality sensitive hashing algorithm to overcome this limitation while maintaining reasonable accuracy with improved performance. Also, we use Storm distributed platform, so that the system benefits from the robustness, scalability and efficiency that this framework offers.
Keywords :
file organisation; social networking (online); text analysis; unsolicited e-mail; Storm distributed platform; Twitter streaming data rate; breaking news stories; data source; document streaming; efficiency improvement; lexical variation degree; locality sensitive hashing algorithm; online service; performance improvement; robustness improvement; scalability improvement; scalable distributed first-story detection problem; spam removal; tweets; word dictionary; Conferences; Fasteners; Parallel processing; Storms; Topology; Twitter; Vectors; Distributed platform; Efficiency; First Story Detection (FSD); Lexical variation; Robustness; Scalability; Storm;
Conference_Titel :
Advances in Engineering and Technology Research (ICAETR), 2014 International Conference on
Conference_Location :
Unnao
DOI :
10.1109/ICAETR.2014.7012915