• DocumentCode
    2534508
  • Title

    Evaluating and improving integration quality for heterogeneous data sources using statistical analysis

  • Author

    Altareva, Evgeniya ; Conrad, Stefan

  • Author_Institution
    Dept. of Comput. Sci., Dusseldorf Univ., Germany
  • fYear
    2005
  • fDate
    25-27 July 2005
  • Firstpage
    406
  • Lastpage
    414
  • Abstract
    This paper considers the problem of integrating heterogeneous semi-structured data sources with the purpose of estimating integration quality (IQ). Integration of such data sources leads to results with unpredictable trustworthiness and none of the existing methods is capable of accounting for the uncertainty which is accumulated over all of the integration steps and which affects integration quality. To compute the uncertainties we suggest using a well-established statistical method Latent Class Analysis (LCA). This method allows to analyze the influence of the latent factors associated with the real-world entities on the set of data. We show on examples how the proposed approach can be used for evaluating and improving IQ giving an important tool to the users concerned with the data´s trustworthiness.
  • Keywords
    distributed databases; statistical analysis; heterogeneous data sources; heterogeneous semistructured data sources; integration quality; latent class analysis; statistical analysis; Cleaning; Computer science; Data engineering; Data mining; Databases; Information systems; Statistical analysis; Uncertainty;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Engineering and Application Symposium, 2005. IDEAS 2005. 9th International
  • ISSN
    1098-8068
  • Print_ISBN
    0-7695-2404-4
  • Type

    conf

  • DOI
    10.1109/IDEAS.2005.25
  • Filename
    1540931