مرکز منطقه ای اطلاع رساني علوم و فناوري - List of Twin Clusters: A Data Structure for Similarity Joins in Metric Spaces

DocumentCode :

3292583

Title :

List of Twin Clusters: A Data Structure for Similarity Joins in Metric Spaces

Author :

Paredes, Rodrigo ; Reyes, Nora

Author_Institution :

Univ. de Chile, Santiago

fYear :

2008

fDate :

11-12 April 2008

Firstpage :

131

Lastpage :

138

Abstract :

The metric space model abstracts many proximity or similarity problems, where the most frequently considered primitives are range and k-nearest neighbor search, leaving out the similarity join, an extremely important primitive. In fact, despite the great attention that this primitive has received in traditional and even multidimensional databases, little has been done for general metric databases. We consider a particular type of similarity join: Given two sets of objects and a distance threshold r, find all the object pairs (one from each set) at distance at most r. For this sake, we devise a new metric index, coined List of Twin Clusters, which indexes both sets jointly (instead of the natural approach of indexing one or both sets independently). Our results show significant speedups over the basic quadratic-time naive alternative. Furthermore, we show that our technique can be easily extended to other similarity join variants, e.g., finding the k-closest pairs.

Keywords :

data structures; database management systems; query processing; data structure; general metric databases; k-nearest neighbor search; metric index; metric spaces; multidimensional databases; similarity joins; twin clusters; Cleaning; Data mining; Data structures; Databases; Extraterrestrial measurements; Information retrieval; Performance evaluation; Recruitment; Resumes; Search problems; Similarity joins;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Similarity Search and Applications, 2008. SISAP 2008. First International Workshop on

Conference_Location :

Belfast

Print_ISBN :

0-7695-3101-6

Type :

conf

DOI :

10.1109/SISAP.2008.21

Filename :

4492934

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3292583