Title :
A MapReduce Algorithm for Polygon Retrieval in Geospatial Analysis
Author :
Qiulei Guo ; Palanisamy, Balaji ; Karimi, Hassan A.
Author_Institution :
Sch. of Inf. Sci., Univ. of PittsburPittsburghgh, Pittsburgh, PA, USA
Abstract :
The proliferation of data acquisition devices like 3D laser scanners had led to the burst of large-scale spatial terrain data which imposes many challenges to spatial data analysis and computation. With the advent of several emerging cloud technologies, a natural and cost-effective approach to managing such large-scale data is to store and process such datasets in a publicly hosted cloud service using modern distributed computing paradigms such as MapReduce. For several key spatial data analysis and computation problems, polygon retrieval is a fundamental operation which is often computed under real-time constraints. However, existing sequential algorithms fail to meet this demand effectively given that terrain data in recent years have witnessed an unprecedented growth in both volume and rate. In this work, we present a MapReduce-based parallel polygon retrieval algorithm which aims at minimizing the IO and CPU loads of the map and reduce tasks during spatial data processing. Our proposed algorithm first hierarchically indexes the spatial terrain data using a quad-tree index, with the help of which, a significant amount of data is filtered out in the pre-processing stage based on the query object. In addition, a prefix tree based on the quad-tree index is built to query the relationship between the terrain data and query area in real time which leads to significant savings in both I/O load and CPU time. The performance of the proposed techniques is evaluated in a Hadoop cluster and the results demonstrate that the proposed techniques are scalable and lead to more than 35% reduction in execution time of the polygon retrieval operation over existing distributed algorithms.
Keywords :
data acquisition; data analysis; information filtering; parallel algorithms; quadtrees; query processing; spatial data structures; storage management; CPU loads; Hadoop cluster; I/O load; MapReduce-based parallel polygon retrieval algorithm; cloud service; data acquisition devices; data filtering; data management; data storage; geospatial analysis; hierarchically index; modern distributed computing; prefix tree; quad tree index; query object; sequential algorithm; spatial data analysis; spatial data computation; spatial terrain data; Computational modeling; Distributed databases; Indexes; Partitioning algorithms; Real-time systems; Spatial databases; Tin;
Conference_Titel :
Cloud Computing (CLOUD), 2015 IEEE 8th International Conference on
Conference_Location :
New York City, NY
Print_ISBN :
978-1-4673-7286-2
DOI :
10.1109/CLOUD.2015.123