Title :
Extending data cleaning graphs for supporting user involvement
Author :
Galhardas, Helena ; Lopes, Antonia ; Santos, Emanuel
Author_Institution :
INESC-ID e, Lisbon, Portugal
Abstract :
A data cleaning or an Extract-Transform-Load (ETL) process is usually modeled as a data transformation graph. These graphs typically involve a large number of data transformations and must handle large amounts of data. The involvement of the users who are responsible for executing data cleaning processes over real data is crucial to tune data transformations and to manually correct data items that cannot be automatically handled. In this paper, we extend the notion of data cleaning graph such that it can better support the user interaction in data cleaning processes. We propose that the data cleaning graphs contain: (i) data quality constraints to help users identifying the points in the graph and the records that require their attention; and (ii) manual data repairs that represent the way users can insert the knowledge that is required to manually clean some data records.
Keywords :
data handling; graph grammars; data cleaning graphs; data quality constraints; data repairs; data transformation graph; extract-transform-load process; Chemistry; Manuals; data cleaning; data transformation; integrity constraints; relational databases;
Conference_Titel :
Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on
Conference_Location :
Chaves
Print_ISBN :
978-1-4577-1487-0