• DocumentCode
    2760318
  • Title

    An Application of Data Mining to Identify Data Quality Problems

  • Author

    Januzaj, Eshref ; Januzaj, Visar

  • Author_Institution
    Dept. of Data Anal., MALI - Inf. Technol., Kosova
  • fYear
    2009
  • fDate
    11-16 Oct. 2009
  • Firstpage
    17
  • Lastpage
    22
  • Abstract
    Modern information systems consist of many distributed computer and database systems. The integration of such distributed data into a single data warehouse system is confronted with the well known problem of low data quality. In this paper we present an approach that facilitates a dynamic identification of spurious and error-prone data stored in a large data warehouse. The identification of data quality problems is based on data mining techniques, such as clustering, subspace clustering and classification. Furthermore, we present via a case study the applicability of our approach on real data. The experimental results show that our approach efficiently identifies data quality problems.
  • Keywords
    data mining; data warehouses; distributed databases; pattern classification; pattern clustering; data mining; data quality problems; data warehouse system; database systems; distributed computer; distributed data; information systems; low data quality; subspace classification; subspace clustering; Application software; Companies; Computer applications; Data analysis; Data engineering; Data mining; Data warehouses; Database systems; Distributed computing; Internet; Classification; Clustering; Data Mining; Data Quality;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Engineering Computing and Applications in Sciences, 2009. ADVCOMP '09. Third International Conference on
  • Conference_Location
    Sliema
  • Print_ISBN
    978-1-4244-5082-4
  • Electronic_ISBN
    978-0-7695-3829-7
  • Type

    conf

  • DOI
    10.1109/ADVCOMP.2009.11
  • Filename
    5359651