• DocumentCode
    548883
  • Title

    Extending data cleaning graphs for supporting user involvement

  • Author

    Galhardas, Helena ; Lopes, Antonia ; Santos, Emanuel

  • Author_Institution
    INESC-ID e, Lisbon, Portugal
  • fYear
    2011
  • fDate
    15-18 June 2011
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    A data cleaning or an Extract-Transform-Load (ETL) process is usually modeled as a data transformation graph. These graphs typically involve a large number of data transformations and must handle large amounts of data. The involvement of the users who are responsible for executing data cleaning processes over real data is crucial to tune data transformations and to manually correct data items that cannot be automatically handled. In this paper, we extend the notion of data cleaning graph such that it can better support the user interaction in data cleaning processes. We propose that the data cleaning graphs contain: (i) data quality constraints to help users identifying the points in the graph and the records that require their attention; and (ii) manual data repairs that represent the way users can insert the knowledge that is required to manually clean some data records.
  • Keywords
    data handling; graph grammars; data cleaning graphs; data quality constraints; data repairs; data transformation graph; extract-transform-load process; Chemistry; Manuals; data cleaning; data transformation; integrity constraints; relational databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Systems and Technologies (CISTI), 2011 6th Iberian Conference on
  • Conference_Location
    Chaves
  • Print_ISBN
    978-1-4577-1487-0
  • Type

    conf

  • Filename
    5974328