Title :
Ontologies as a solution for simultaneously integrating and reconciliating data sources
Author :
Bakhtouchi, Abdelghani ; Bellatreche, Ladjel ; Jean, Stéphane ; Ait-Ameur, Yamine
Author_Institution :
Nat. High Sch. for Comput. Sci. (ESI), Algiers, Algeria
Abstract :
With the increasing needs for the world wide enterprises to integrate, share and visualize data from various heterogeneous, autonomous and distributed sources data and Web data covering a given domain, the development of integration and reconciliation solutions becomes a challenging issue. The existing studies on data integration and reconciliation of results have been developed in an isolated way and did not consider the strong integration between these two processes. On one hand, ontologies were largely used for building automatic integration systems due to their ability to reduce schematic and semantic heterogeneities that may exist among sources. On the other hand, reconciliation of results is performed either by considering that all sources use the same identifier for an instance or by means of statistical methods that identify affinities between concepts. These reconciliation solutions are not usually suitable for real-world sensitive-applications where exact results are required and where each source may use a different identifier for the same concept. In this paper, we propose a methodology that simultaneously integrate source data and reconciliate their instances based on ontologies enriched with functional dependencies (FD) in a mediation architecture. The presence of FD gives more autonomy to sources when choosing their primary keys and facilitates the result reconciliation. This methodology is experimented using the Lehigh University Benchmark (LUBM) dataset to show its scalability and the quality of the reconciliation result phase.
Keywords :
Internet; data integration; data visualisation; distributed databases; ontologies (artificial intelligence); statistical analysis; FD; LUBM dataset; Lehigh University Benchmark dataset; Web data; automatic integration systems; autonomous data source; data sharing; data source integration; data source reconciliation; data visualization; distributed data source; enterprises; functional dependencies; heterogeneous data source; identifier; mediation architecture; ontologies; schematic heterogeneity reduction; semantic heterogeneity reduction; statistical methods; Distributed databases; Educational institutions; Electronic mail; Ontologies; Semantics; Silicon; Data integration; data reconciliation; ontology;
Conference_Titel :
Research Challenges in Information Science (RCIS), 2012 Sixth International Conference on
Conference_Location :
Valencia
Print_ISBN :
978-1-4577-1936-3
Electronic_ISBN :
2151-1349
DOI :
10.1109/RCIS.2012.6240431