DocumentCode :
2760318
Title :
An Application of Data Mining to Identify Data Quality Problems
Author :
Januzaj, Eshref ; Januzaj, Visar
Author_Institution :
Dept. of Data Anal., MALI - Inf. Technol., Kosova
fYear :
2009
fDate :
11-16 Oct. 2009
Firstpage :
17
Lastpage :
22
Abstract :
Modern information systems consist of many distributed computer and database systems. The integration of such distributed data into a single data warehouse system is confronted with the well known problem of low data quality. In this paper we present an approach that facilitates a dynamic identification of spurious and error-prone data stored in a large data warehouse. The identification of data quality problems is based on data mining techniques, such as clustering, subspace clustering and classification. Furthermore, we present via a case study the applicability of our approach on real data. The experimental results show that our approach efficiently identifies data quality problems.
Keywords :
data mining; data warehouses; distributed databases; pattern classification; pattern clustering; data mining; data quality problems; data warehouse system; database systems; distributed computer; distributed data; information systems; low data quality; subspace classification; subspace clustering; Application software; Companies; Computer applications; Data analysis; Data engineering; Data mining; Data warehouses; Database systems; Distributed computing; Internet; Classification; Clustering; Data Mining; Data Quality;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Engineering Computing and Applications in Sciences, 2009. ADVCOMP '09. Third International Conference on
Conference_Location :
Sliema
Print_ISBN :
978-1-4244-5082-4
Electronic_ISBN :
978-0-7695-3829-7
Type :
conf
DOI :
10.1109/ADVCOMP.2009.11
Filename :
5359651
Link To Document :
بازگشت