Title :
An Enhanced Technique to Clean Data in the Data Warehouse
Author :
Hamad, Mortadha M. ; Jihad, Alaa Abdulkhar
Author_Institution :
Coll. of Comput., Univ. of Anbar, Ramadi, Iraq
Abstract :
Data quality is a critical factor for the success of data warehousing projects. Improving the quality of data is important in data warehouse, because it is used in the process of decision support, which requires accurate data. There are many errors and inconsistencies that occur in the data sets when brought in from several sources. Data cleaning is the process of identifying and removing or correcting errors in the data. There are some methods to deal with data cleaning, but they are generally inefficient in cleaning the data because they suffer from variety of errors. In this paper we present an enhanced technique to clean data in the data warehouse by using a new algorithm that detects and corrects most of the error types and expected problems, such as lexical errors, domain format errors, irregularities, integrity constraint violation, and duplicates.
Keywords :
data handling; data warehouses; data cleaning; data quality; data warehouse; decision support; domain format error; duplicates; integrity constraint violation; lexical error; Cities and towns; Cleaning; Computer science; Data mining; Data warehouses; Remuneration; Data cleaning; Extraction-Transformation-Loading (ETL); data quality; data set; data warehouse (DW);
Conference_Titel :
Developments in E-systems Engineering (DeSE), 2011
Conference_Location :
Dubai
Print_ISBN :
978-1-4577-2186-1
DOI :
10.1109/DeSE.2011.32