DocumentCode :
3334357
Title :
Two-phase schema matching in real world relational databases
Author :
Bozovic, Nikolaos ; Vassalos, Vasilis
Author_Institution :
Dept. of Inf., Athens Univ. of Econ. & Bus., Athens
fYear :
2008
fDate :
7-12 April 2008
Firstpage :
290
Lastpage :
296
Abstract :
We propose a new approach to the problem of schema matching in relational databases that merges the hybrid and composite approach of combining multiple individual matching techniques. In particular, we propose assigning individual matchers to two categories, "strong" matchers that provide a priori higher quality matches, and "weak" matchers that may be more sensitive to the inputs and are less reliable but can still help generate some matches. Matching is correspondingly done in two phases, with strong "matches" being produced by strong matchers being combined using a simple voting combiner, and weak matchers providing additional evidence for attributes left unmatched (again using a voting combiner). We observe that, while many recent advances in schema matching (Madhavan et al., 2005) use composite schema matching and rely on the existence of training schemas to train combiners, in many real-world situations it is not feasible to employ learning techniques because of the unavailability of training data (i.e., schemas or instance data.) We hypothesize that "weak" matchers can often hurt overall accuracy if used in a "single-phase" composite matcher that does not employ learning techniques. We implement our two-stage approach in the ASED system and evaluate it using real life schemas. The experiments validate our hypothesis regarding the negative effect of "weak" matchers and also show ASID performs comparably to state of the art systems while requiring no training schemas. We also demonstrate the benefits of a simple documentation-based matcher. Our experimental data included schemas ranging from 20 to 120 attributes. Note that schemas with 120 attributes are as large or larger than other published evaluations of relational schema matching.
Keywords :
learning (artificial intelligence); pattern matching; relational databases; documentation-based matcher; machine learning technique; relational database; training schema; two-phase relational schema matching; voting combiner; Availability; Database systems; Informatics; Internet; Machine learning; Neural networks; Ontologies; Relational databases; Training data; Voting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-2161-9
Electronic_ISBN :
978-1-4244-2162-6
Type :
conf
DOI :
10.1109/ICDEW.2008.4498334
Filename :
4498334
Link To Document :
بازگشت