مرکز منطقه ای اطلاع رساني علوم و فناوري - An incremental clustering scheme for duplicate detection in large databases

DocumentCode :

2533842

Title :

An incremental clustering scheme for duplicate detection in large databases

Author :

Cesario, Eugenio ; Folino, Francesco ; Manco, Giuseppe ; Pontieri, Luigi

Author_Institution :

ICAR-CNR, Rende, Italy

fYear :

2005

fDate :

25-27 July 2005

Firstpage :

Lastpage :

Abstract :

We propose an incremental algorithm for clustering duplicate tuples in large databases, which allows to assign any new tuple t to the cluster containing the database tuples which are most similar to t (and hence are likely to refer to the same real-world entity t is associated with). The core of the approach is a hash-based indexing technique that tends to assign highly similar objects to the same buckets. Empirical evaluation proves that the proposed method allows to gain considerable efficiency improvement over a state-of-art index structure for proximity searches in metric spaces.

Keywords :

database indexing; database tuples; duplicate detection; duplicate tuples; hash-based indexing; incremental clustering; index structure; large databases; metric spaces; proximity searches; Clustering algorithms; Clustering methods; Couplings; Data engineering; Delay; Extraterrestrial measurements; Indexing; Information retrieval; Scalability; Spatial databases;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Database Engineering and Application Symposium, 2005. IDEAS 2005. 9th International

ISSN :

1098-8068

Print_ISBN :

0-7695-2404-4

Type :

conf

DOI :

10.1109/IDEAS.2005.10

Filename :

1540899

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2533842