Title :
SmartClean: An Incremental Data Cleaning Tool
Author :
Oliveira, Paulo ; Rodrigues, Fátima ; Henriques, Pedro
Author_Institution :
Comput. Sci. Dept., Inst. of Eng. - Polytech. of Porto, Porto, Portugal
Abstract :
This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that, an execution sequence was developed. The problems are manipulated (i.e., detected and corrected) following that sequence. The sequence also supports the incremental execution of the operations. In this paper, the underlying architecture of the tool is presented and its components are described in detail. The tool´s validity and, consequently, of the architecture is demonstrated through the presentation of a case study. Although SmartClean has cleaning capabilities in all other levels, in this paper are only described those related with the attribute value level.
Keywords :
data handling; software tools; SmartClean tool; data quality problems; execution sequence; incremental data cleaning tool; Cleaning; Computer science; Data analysis; Data engineering; Multidimensional systems; Performance analysis; Performance evaluation; Software quality; Software tools; Testing; Architecture; Correction; Data Cleaning; Data Quality Problems; Detection; Tool;
Conference_Titel :
Quality Software, 2009. QSIC '09. 9th International Conference on
Conference_Location :
Jeju
Print_ISBN :
978-1-4244-5912-4
DOI :
10.1109/QSIC.2009.67