DocumentCode :
2554454
Title :
An optimal feature selection method for approximately duplicate records detecting
Author :
Hua, Quanping ; Xiang, Ming ; Sun, Fangyi
Author_Institution :
Mechano-Electron. & Inf. Eng. Insititue, Zhejiang Textile & Fashion Vocational & Tech. Colle, Ningbo, China
fYear :
2010
fDate :
16-18 April 2010
Firstpage :
446
Lastpage :
450
Abstract :
During duplicate records detection and recognition in large number of data sets, detection accuracy is low and cost of detecting is high because that source of data are complicated and there are too many feature attributes. To solve these questions, we proposed an optimal feature selection method based on fuzzy clustering in groups. First, it deals with attributes of records in groups so as to reduce dimensions of attributes recorded effectively and obtain representative records in groups. Then it detects approximately duplicate records in groups by a computing method which compares with similarity. With theory analysis and experiments, it shows that identification accuracy and detection efficiency of this method are higher and it can solve recognition problem of approximately duplicate records in large number of data sets better.
Keywords :
feature extraction; fuzzy set theory; pattern clustering; detection efficiency; duplicate records detection; fuzzy clustering; identification accuracy; optimal feature selection method; Clustering algorithms; Clustering methods; Cost function; Data engineering; Data mining; Dictionaries; Educational institutions; Optimization methods; Sun; Textile technology; approximately duplicated records; fuzzy clustering; optimal feature selection; property optimization; similarity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-5263-7
Electronic_ISBN :
978-1-4244-5265-1
Type :
conf
DOI :
10.1109/ICIME.2010.5478101
Filename :
5478101
Link To Document :
بازگشت