Title :
The hybrid tree: an index structure for high dimensional feature spaces
Author :
Chakrabarti, Kaushik ; Mehrotra, Sharad
Author_Institution :
Dept. of Comput. Sci., Illinois Univ., Urbana, IL, USA
Abstract :
Feature-based similarity searching is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high-dimensional feature space which is indexed using a multidimensional data structure. Similarity searching then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree-a multidimensional data structure for indexing high-dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (such as the R-tree, SS-tree or SR-tree) or a pure space partitioning (SP) one (such as the KDB-tree or hB-tree); rather it combines the positive aspects of the two types of index structures into a single data structure to achieve a search performance which is more scalable to high dimensionalities than either of the above techniques. Furthermore, unlike many data structures (e.g. distance-based index structures like the SS-tree and SR-tree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on “real” high-dimensional large-size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scans at all dimensionalities for large-sized databases
Keywords :
database indexing; database theory; query processing; tree data structures; tree searching; very large databases; arbitrary distance functions; data item mapping; data partitioning; database systems; feature indexing; feature-based similarity searching; high-dimensional feature spaces; hybrid tree; index structure; large-sized databases; linear scans; multidimensional data structure; performance; queries; range search; scalable data structure; search performance; space partitioning; Computer science; Data structures; Database systems; Indexing; Laboratories; NASA; Software libraries; Strontium; Tree data structures;
Conference_Titel :
Data Engineering, 1999. Proceedings., 15th International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
0-7695-0071-4
DOI :
10.1109/ICDE.1999.754960