Title : 
Research of Duplicate Record Cleaning Technology Based on a Reformative Keywords Matching Algorithm
         
        
            Author : 
Yan Hu ; Wei Li ; Ying Qiu ; Wei Wu
         
        
            Author_Institution : 
Sch. of Comput. Sci. & Technol., Wuhan Univ. of Technol. Wuhan, Wuhan
         
        
        
        
        
        
            Abstract : 
Based on the analysis of the insufficiencies of the present Chinese matching algorithms, by examining the characteristics of approximately duplicate records, this paper proposes a method of duplicate record cleaning based on a reformative keywords matching algorithm. Experiments show that this method improves Recall and Precision of duplicate record evidently.
         
        
            Keywords : 
data mining; data warehouses; pattern matching; Chinese matching algorithm; data mining; data warehouse; duplicate record cleaning technology; reformative keyword matching algorithm; Algorithm design and analysis; Cleaning; Computer science; Data analysis; Data handling; Data mining; Data warehouses; Databases; Internet;
         
        
        
        
            Conference_Titel : 
E-Business and Information System Security, 2009. EBISS '09. International Conference on
         
        
            Conference_Location : 
Wuhan
         
        
            Print_ISBN : 
978-1-4244-2909-7
         
        
            Electronic_ISBN : 
978-1-4244-2910-3
         
        
        
            DOI : 
10.1109/EBISS.2009.5138036