DocumentCode
2346737
Title
Application of clustering and association methods in data cleaning
Author
Ciszak, Lukasz
Author_Institution
Inst. of Comput. Sci., Warsaw Univ. of Technol., Warsaw
fYear
2008
fDate
20-22 Oct. 2008
Firstpage
97
Lastpage
103
Abstract
Data cleaning is a process of maintaining data quality in information systems. Current data cleaning solutions require reference data to identify incorrect or duplicate entries. This article proposes usage of data mining in the area of data cleaning as effective in discovering reference data and validation rules from the data itself. Two algorithms designed by the author for data attribute correction have been presented. Both algorithms utilize data mining methods. Experimental results show that both algorithms can effectively clean text attributes without external reference data.
Keywords
data mining; association methods; clustering methods; data attribute correction; data cleaning; data mining; data quality; information systems; Cleaning;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on
Conference_Location
Wisia
Print_ISBN
978-83-60810-14-9
Type
conf
DOI
10.1109/IMCSIT.2008.4747224
Filename
4747224
Link To Document