DocumentCode :
710115
Title :
Efficient metric indexing for similarity search
Author :
Lu Chen ; Yunjun Gao ; Xinhan Li ; Jensen, Christian S. ; Gang Chen
Author_Institution :
Coll. of Comput. Sci., Zhejiang Univ., Hangzhou, China
fYear :
2015
fDate :
13-17 April 2015
Firstpage :
591
Lastpage :
602
Abstract :
The goal in similarity search is to find objects similar to a specified query object given a certain similarity criterion. Although useful in many areas, such as multimedia retrieval, pattern recognition, and computational biology, to name but a few, similarity search is not yet supported well by commercial DBMS. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. We propose an efficient disk-based metric access method, the Space-filling curve and Pivot-based B+-tree (SPB-tree), to support a wide range of data types and similarity metrics. The SPB-tree uses a small set of so-called pivots to reduce significantly the number of distance computations, uses a space-filling curve to cluster the data into compact regions, thus improving storage efficiency, and utilizes a B+-tree with minimum bounding box information as the underlying index. The SPB-tree also employs a separate random access file to efficiently manage a large and complex data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient similarity search algorithms and corresponding cost models based on the SPB-tree. Extensive experiments using real and synthetic data show that the SPB-tree has much lower construction cost, smaller storage size, and can support more efficient similarity queries with high accuracy cost models than is the case for competing techniques. Moreover, the SPB-tree scales sublinearly with growing dataset size.
Keywords :
indexing; query processing; tree searching; DBMS; Pivot-based B+-tree; computational biology; disk-based metric access method; metric indexing; multimedia retrieval; pattern recognition; similarity search; space-filling curve; Algorithm design and analysis; Extraterrestrial measurements; Hafnium; Indexes; Object recognition; Search problems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
Type :
conf
DOI :
10.1109/ICDE.2015.7113317
Filename :
7113317
Link To Document :
بازگشت