Title :
An efficient parallel algorithm for high dimensional similarity join
Author :
Alsabti, Khaled ; Ranka, Sanjay ; Singh, Vineet
Author_Institution :
Syracuse Univ., NY, USA
fDate :
30 Mar-3 Apr 1998
Abstract :
Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other. The ε-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the ε-k-d-B tree and use it to optimize the leaf size. We present novel parallel algorithms for the similarity join using the ε-k-d-B tree. A load balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Further its cost is proportional to the overall cost of the similarity join
Keywords :
database theory; optimisation; parallel algorithms; relational algebra; relational databases; resource allocation; tree data structures; cost model; equi-depth histograms; high dimensional similarity join; high-skew datasets; leaf size optimization; load balancing strategy; low-skew datasets; multidimensional points; multidimensional similarity join; parallel algorithm; tree data structure; Concurrent computing; Cost function; Data structures; Histograms; Information technology; Multidimensional systems; Multimedia databases; Parallel algorithms; Space technology; Tree data structures;
Conference_Titel :
Parallel Processing Symposium, 1998. IPPS/SPDP 1998. Proceedings of the First Merged International ... and Symposium on Parallel and Distributed Processing 1998
Conference_Location :
Orlando, FL
Print_ISBN :
0-8186-8404-6
DOI :
10.1109/IPPS.1998.669980