DocumentCode :
676932
Title :
A multi-terabyte relational database for geo-tagged social network data
Author :
Dobos, Lubomir ; Szule, Janos ; Bodnar, Todd ; Hanyecz, Tamas ; Sebok, Tamas ; Kondor, Daniel ; Kallus, Zsofia ; Steger, Jozsef ; Csabai, Istvan ; Vattay, Gabor
Author_Institution :
Dept. of Phys. of Complex Syst., Eotvos Lorand Univ., Budapest, Hungary
fYear :
2013
fDate :
2-5 Dec. 2013
Firstpage :
289
Lastpage :
294
Abstract :
Despite their relatively low sampling factor, the freely available, randomly sampled status streams of Twitter are very useful sources of geographically embedded social network data. To statistically analyze the information Twitter provides via these streams, we have collected a year´s worth of data and built a multi-terabyte relational database from it. The database is designed for fast data loading and to support a wide range of studies focusing on the statistics and geographic features of social networks, as well as on the linguistic analysis of tweets. In this paper we present the method of data collection, the database design, the data loading procedure and special treatment of geo-tagged and multi-lingual data. We also provide some SQL recipes for computing network statistics.
Keywords :
SQL; geography; information retrieval; relational databases; social networking (online); statistical analysis; SQL recipes; Twitter; data collection; data loading procedure; database design; geo-tagged social network data; geographic features; geographically embedded social network data; linguistic analysis; multiterabyte relational database; network statistics; random sampled status streams; statistical analysis; Global Positioning System; Indexes; Loading; Servers; Twitter;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cognitive Infocommunications (CogInfoCom), 2013 IEEE 4th International Conference on
Conference_Location :
Budapest
Print_ISBN :
978-1-4799-1543-9
Type :
conf
DOI :
10.1109/CogInfoCom.2013.6719259
Filename :
6719259
Link To Document :
بازگشت