Title :
A Prefix-Filter based Method for Spatio-Textual Similarity Join
Author :
Sitong Liu ; Guoliang Li ; Jianhua Feng
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
Location-based services have attracted significant attention due to modern mobile phones equipped with GPS devices. These services generate large amounts of spatio-textual data which contain both spatial location and textual descriptions. Since a spatio-textual object may have different representations, possibly because of deviations of GPS or different user descriptions, it calls for efficient methods to integrate spatio-textual data from different sources. In this paper we study a new research problem called spatio-textual similarity join: given two sets of spatio-textual objects, find the similar object pairs. We make the following contributions: (1) We develop a filter-and-refine framework and devise several efficient algorithms. We extend the prefix filter technique to generate spatial and textual signatures for the objects and build inverted index on top of these signatures. Then we generate candidate pairs using the inverted lists of signatures. Finally we refine the candidates and generate the final result. (2) We study how to generate high-quality signatures for spatial information. We develop an MBR-prefix based signature to prune large numbers of dissimilar object pairs. (3) We propose a hybrid signature scheme to support both textual pruning and spatial pruning simultaneously. (4) Experimental results on real and synthetic datasets show that our algorithms achieve high performance and scale well.
Keywords :
Global Positioning System; data integration; filtering theory; mobile computing; mobile radio; visual databases; GPS devices; Global Position Systems; MBR-prefix based signature; candidate pair generation; filter-and-refine framework; hybrid signature scheme; inverted index; location-based services; mobile phones; prefix-filter based method; spatial location; spatial pruning; spatial signature generation; spatio-textual data generation; spatio-textual data integration; spatio-textual similarity join; textual descriptions; textual pruning; textual signature generation; user descriptions; Complexity theory; Filtering algorithms; Global Positioning System; Indexes; Partitioning algorithms; Probes; Sorting; Database Applications; Database Management; Information Search and Retrieval; Information Storage and Retrieval; Information Technology and Systems; MBR prefix; Spatial databases and GIS; Spatio-textual objects; hybrid signature; similarity join;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2013.83