• DocumentCode
    2346737
  • Title

    Application of clustering and association methods in data cleaning

  • Author

    Ciszak, Lukasz

  • Author_Institution
    Inst. of Comput. Sci., Warsaw Univ. of Technol., Warsaw
  • fYear
    2008
  • fDate
    20-22 Oct. 2008
  • Firstpage
    97
  • Lastpage
    103
  • Abstract
    Data cleaning is a process of maintaining data quality in information systems. Current data cleaning solutions require reference data to identify incorrect or duplicate entries. This article proposes usage of data mining in the area of data cleaning as effective in discovering reference data and validation rules from the data itself. Two algorithms designed by the author for data attribute correction have been presented. Both algorithms utilize data mining methods. Experimental results show that both algorithms can effectively clean text attributes without external reference data.
  • Keywords
    data mining; association methods; clustering methods; data attribute correction; data cleaning; data mining; data quality; information systems; Cleaning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on
  • Conference_Location
    Wisia
  • Print_ISBN
    978-83-60810-14-9
  • Type

    conf

  • DOI
    10.1109/IMCSIT.2008.4747224
  • Filename
    4747224