Title :
High dimensional similarity search with space filling curves
Author :
Liao, Swanwa ; Lopez, Mario A. ; Leutenegger, Scott T.
Author_Institution :
Dept. of Math. & Comput. Sci., Denver Univ., CO, USA
Abstract :
We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any Lt-metric, t=1,...,∞. The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d+1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d1+1t/) factor of the exact nearest, can be returned with at most (d+1)log, n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d logp n) page accesses and generalizes easily to find approximate k-nearest neighbors
Keywords :
data mining; database theory; query processing; tree data structures; very large databases; B-trees; approximate nearest neighbor queries; branching factor; data mining; data sets; high dimensional similarity search; page access; space filling curves; Approximation algorithms; Computer science; Data mining; Filling; Image processing; Mathematics; Nearest neighbor searches; Neural networks; Pattern recognition; Sorting;
Conference_Titel :
Data Engineering, 2001. Proceedings. 17th International Conference on
Conference_Location :
Heidelberg
Print_ISBN :
0-7695-1001-9
DOI :
10.1109/ICDE.2001.914876