DocumentCode :
3461657
Title :
SAFAL: A MapReduce Spatio-temporal Analyzer for UNAVCO FTP Logs
Author :
Hodgkinson, Kathleen ; Rezgui, Abdelmounaam
Author_Institution :
Plate Boundary Obs., UNAVCO, Boulder, CO, USA
fYear :
2013
fDate :
3-5 Dec. 2013
Firstpage :
1083
Lastpage :
1090
Abstract :
UNAVCO is a National Science Foundation (NSF) funded consortium that facilitates geoscience research and education using geodesy. It is responsible for the collection, archiving and distribution of data from GPS sites installed in every continent of the world. In addition to GPS data, UNAVCO collects borehole seismic, strain meter, meteorological, and digital imagery data. One of UNAVCO´s largest programs is the Plate Boundary Observatory (PBO), the geodetic component of the NSF funded Earth scope program. PBO consists of over 1100 continuous GPS sites plus 80 borehole strain and seismic sites. In this paper, we present SAFAL, a Spatio-temporal Analyzer of FTP Access Logs collected by UNAVCO´s data center. We developed SAFAL using Hadoop/MapReduce. The motivation for this work was to build an efficient system able to quickly identify trends in GPS data usage. The system is able to processes millions of lines of data in minutes. It supports queries such as: (i) what is the most downloaded GPS site, (ii) who is downloading the data most, or (iii) what periods of data are of greatest interest. Answers to these and similar queries are useful for planning network growth, allocating Web resources, and tracking hot topics in geoscience research. They also may be extremely useful to help UNAVCO illuminate dark data.
Keywords :
Global Positioning System; Internet; data acquisition; data mining; geodesy; geographic information systems; information retrieval; FTP access logs; GPS data usage; GPS sites; Hadoop; MapReduce spatio-temporal analyzer; NSF funded Earth scope program; NSF funded consortium; National Science Foundation; PBO; SAFAL; UNAVCO FTP logs; UNAVCO data center; Web usage mining; borehole seismic data; borehole strain; data archiving; data collection; data distribution; digital imagery data; geodesy; geodetic component; geoscience education; geoscience research; meteorological data; plate boundary observatory; strain meter; Data mining; Geoscience; Global Positioning System; Planning; Servers; US Government; FTP access logs; GPS sites; Hadoop; MapReduce; Web usage mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location :
Sydney, NSW
Type :
conf
DOI :
10.1109/CSE.2013.157
Filename :
6755338
Link To Document :
بازگشت