DocumentCode :
3739134
Title :
Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique
Author :
Takako Hashimoto;Dave Shepard;Tetsuji Kuboyama;Kilho Shin
Author_Institution :
Chiba Univ. of Commerce, Chiba, Japan
fYear :
2015
Firstpage :
7
Lastpage :
12
Abstract :
Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method´s effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.
Keywords :
"Feature extraction","Earthquakes","Media","Time series analysis","Clustering algorithms","Event detection","Electronic mail"
Publisher :
ieee
Conference_Titel :
Data Mining Workshop (ICDMW), 2015 IEEE International Conference on
Electronic_ISBN :
2375-9259
Type :
conf
DOI :
10.1109/ICDMW.2015.248
Filename :
7395645
Link To Document :
بازگشت