مرکز منطقه ای اطلاع رساني علوم و فناوري - An optimal feature selection method for approximately duplicate records detecting

DocumentCode :

2554454

Title :

An optimal feature selection method for approximately duplicate records detecting

Author :

Hua, Quanping ; Xiang, Ming ; Sun, Fangyi

Author_Institution :

Mechano-Electron. & Inf. Eng. Insititue, Zhejiang Textile & Fashion Vocational & Tech. Colle, Ningbo, China

fYear :

2010

fDate :

16-18 April 2010

Firstpage :

446

Lastpage :

450

Abstract :

During duplicate records detection and recognition in large number of data sets, detection accuracy is low and cost of detecting is high because that source of data are complicated and there are too many feature attributes. To solve these questions, we proposed an optimal feature selection method based on fuzzy clustering in groups. First, it deals with attributes of records in groups so as to reduce dimensions of attributes recorded effectively and obtain representative records in groups. Then it detects approximately duplicate records in groups by a computing method which compares with similarity. With theory analysis and experiments, it shows that identification accuracy and detection efficiency of this method are higher and it can solve recognition problem of approximately duplicate records in large number of data sets better.

Keywords :

feature extraction; fuzzy set theory; pattern clustering; detection efficiency; duplicate records detection; fuzzy clustering; identification accuracy; optimal feature selection method; Clustering algorithms; Clustering methods; Cost function; Data engineering; Data mining; Dictionaries; Educational institutions; Optimization methods; Sun; Textile technology; approximately duplicated records; fuzzy clustering; optimal feature selection; property optimization; similarity;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on

Conference_Location :

Chengdu

Print_ISBN :

978-1-4244-5263-7

Electronic_ISBN :

978-1-4244-5265-1

Type :

conf

DOI :

10.1109/ICIME.2010.5478101

Filename :

5478101

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2554454