DocumentCode
548883
Title
Extending data cleaning graphs for supporting user involvement
Author
Galhardas, Helena ; Lopes, Antonia ; Santos, Emanuel
Author_Institution
INESC-ID e, Lisbon, Portugal
fYear
2011
fDate
15-18 June 2011
Firstpage
1
Lastpage
6
Abstract
A data cleaning or an Extract-Transform-Load (ETL) process is usually modeled as a data transformation graph. These graphs typically involve a large number of data transformations and must handle large amounts of data. The involvement of the users who are responsible for executing data cleaning processes over real data is crucial to tune data transformations and to manually correct data items that cannot be automatically handled. In this paper, we extend the notion of data cleaning graph such that it can better support the user interaction in data cleaning processes. We propose that the data cleaning graphs contain: (i) data quality constraints to help users identifying the points in the graph and the records that require their attention; and (ii) manual data repairs that represent the way users can insert the knowledge that is required to manually clean some data records.
Keywords
data handling; graph grammars; data cleaning graphs; data quality constraints; data repairs; data transformation graph; extract-transform-load process; Chemistry; Manuals; data cleaning; data transformation; integrity constraints; relational databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on
Conference_Location
Chaves
Print_ISBN
978-1-4577-1487-0
Type
conf
Filename
5974328
Link To Document