DocumentCode :
3427533
Title :
High-dimensional similarity joins
Author :
Shim, Kyuseok ; Srikant, Ramakrishnan ; Agrawal, Rakesh
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
fYear :
1997
fDate :
7-11 Apr 1997
Firstpage :
301
Lastpage :
311
Abstract :
Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the ε-kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence the proposed index structure scales to high-dimensional data. Empirical evaluation, using synthetic and real-life datasets, shows that similarity join using the ε-kdB tree is 2 to an order of magnitude faster than the R+ tree, with the performance gap increasing with the number of dimensions
Keywords :
data structures; information retrieval systems; relational databases; ϵ-kdB tree; data mining; high-dimensional similarity joins; index structure; neighboring leaf nodes; real-life datasets; Costs; Data mining; Data structures; Image databases; Image retrieval; Multidimensional systems; Multimedia databases; Music information retrieval; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 1997. Proceedings. 13th International Conference on
Conference_Location :
Birmingham
ISSN :
1063-6382
Print_ISBN :
0-8186-7807-0
Type :
conf
DOI :
10.1109/ICDE.1997.581814
Filename :
581814
Link To Document :
بازگشت