DocumentCode :
3248626
Title :
High performance data mining using the nearest neighbor join
Author :
Böhm, Christian ; Krebs, Florian
fYear :
2002
fDate :
2002
Firstpage :
43
Lastpage :
50
Abstract :
The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Well-known are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or k-distance join which retrieves the k most similar pairs. In this paper, we investigate an important, third similarity join operation called k-nearest neighbor join which combines each point Of one point set with its k nearest neighbors in the other set. It has been shown that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbor classification, data cleansing, postprocessing of sampling-based data mining etc. can be implemented on top of the k-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute the k-nearest neighbor join using the multipage index (MuX), a specialized index structure for the similarity join. To reduce both CPU and I/O cost, we develop optimal loading and processing strategies.
Keywords :
data mining; database theory; query processing; data mining; database primitive; multidimensional databases; multipage index; similarity join; similarity search; Acceleration; Biomedical informatics; Clustering algorithms; Cost function; Data analysis; Data mining; Databases; Multidimensional systems; Nearest neighbor searches; Performance gain;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183884
Filename :
1183884
Link To Document :
بازگشت