Title :
HADCLEAN: A hybrid approach to data cleaning in data warehouses
Author :
Paul, A. ; Ganesan, V. ; Challa, J.S. ; Sharma, Yogesh
Author_Institution :
Dept. of Comput. Sci. & Inf. Syst., Birla Inst. of Technol. & Sci., Pilani, India
Abstract :
Data Cleaning is a very important part of the data warehouse management process. It is not a very easy process as many different types of unclean data (bad data, incomplete data, typos, etc) can be present. Also, whether a data is clean or dirty is highly dependent on the nature and source of the raw data. Many attempts have been made to clean the data using blocking algorithms, phonetic algorithms, etc. In this paper an attempt has been made to provide a hybrid approach HADCLEAN for cleaning data which combines modified versions of PNRS and Transitive closure algorithms.
Keywords :
data analysis; data warehouses; HADCLEAN; PNRS; blocking algorithm; data cleaning; data warehouse management process; hybrid approach; phonetic algorithm; raw data; transitive closure algorithm; Algorithm design and analysis; Cleaning; Data warehouses; Dictionaries; Heuristic algorithms; Mobile communication; Standards; HADCLEAN; PNRS; data warehouse; near miss; phonetic algorithm; transitive closure;
Conference_Titel :
Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4673-1091-8
DOI :
10.1109/InfRKM.2012.6205022