Application of clustering and association methods in data cleaning

Author

Ciszak, Lukasz

Author_Institution

Inst. of Comput. Sci., Warsaw Univ. of Technol., Warsaw

fYear

2008

fDate

20-22 Oct. 2008

Firstpage

Lastpage

103

Abstract

Data cleaning is a process of maintaining data quality in information systems. Current data cleaning solutions require reference data to identify incorrect or duplicate entries. This article proposes usage of data mining in the area of data cleaning as effective in discovering reference data and validation rules from the data itself. Two algorithms designed by the author for data attribute correction have been presented. Both algorithms utilize data mining methods. Experimental results show that both algorithms can effectively clean text attributes without external reference data.

Keywords

data mining; association methods; clustering methods; data attribute correction; data cleaning; data mining; data quality; information systems; Cleaning;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on

Conference_Location

Wisia

Print_ISBN

978-83-60810-14-9

Type

conf

DOI

10.1109/IMCSIT.2008.4747224

Filename

4747224

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2346737