DocumentCode :
2835197
Title :
IKMC: An Improved K-Medoids Clustering Method for Near-Duplicated Records Detection
Author :
Pei, Ying ; Xu, Jungang ; Cen, Zhiwang ; Sun, Jian
Author_Institution :
Sch. of Inf. Sci. & Eng., Grad. Univ. of Chinese Acad. of Sci., Beijing, China
fYear :
2009
fDate :
11-13 Dec. 2009
Firstpage :
1
Lastpage :
4
Abstract :
An improved K-medoids clustering algorithm (IKMC) to resolve the problem of detecting the near-duplicated records is proposed in this paper. It considers every record in database as one separate data object, uses edit-distance method and the weights of attributes to get similarity value among records, then detect duplicated records by clustering these similarity value. This algorithm can automatically adjust the number of clusters through comparing the similarity value with the preset similarity threshold, and avoid a large numbers of I/O operations used by traditional "sort/merge" algorithm for sequencing. Through the experiment, this algorithm is proved to have good detection accuracy and high availability.
Keywords :
database management systems; merging; pattern clustering; records management; sorting; database; edit-distance method; improved K-medoids clustering method; merge algorithm; near-duplicated records detection; sort algorithm; Availability; Clustering algorithms; Clustering methods; Database systems; Information science; Information systems; Object detection; Sorting; Space technology; Sun;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
Type :
conf
DOI :
10.1109/CISE.2009.5364382
Filename :
5364382
Link To Document :
بازگشت