• DocumentCode
    2846809
  • Title

    Corpus-based schema matching

  • Author

    Madhavan, Jayant ; Bernstein, Philip A. ; Doan, AnHai ; Halevy, Alon

  • Author_Institution
    Washington Univ., MO, USA
  • fYear
    2005
  • fDate
    5-8 April 2005
  • Firstpage
    57
  • Lastpage
    68
  • Abstract
    Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate. Past solutions have proposed a principled combination of multiple algorithms. However, these solutions sometimes perform rather poorly due to the lack of sufficient evidence in the schemas being matched. In this paper we show how a corpus of schemas and mappings can be used to augment the evidence about the schemas being matched, so they can be matched better. Such a corpus typically contains multiple schemas that model similar concepts and hence enables us to learn variations in the elements and their properties. We exploit such a corpus in two ways. First, we increase the evidence about each element being matched by including evidence from similar elements in the corpus. Second, we learn statistics about elements and their relationships and use them to infer constraints that we use to prune candidate mappings. We also describe how to use known mappings to learn the importance of domain and generic constraints. We present experimental results that demonstrate corpus-based matching outperforms direct matching (without the benefit of a corpus) in multiple domains.
  • Keywords
    case-based reasoning; data integrity; data mining; database management systems; learning (artificial intelligence); AI learning; artificial intelligence; candidate mapping; concept discovery; corpus-based schema matching; data mining; domain constraints; inference mechanism; Buildings; Data models; Databases; Documentation; Humans; Message passing; Ontologies; Pattern matching; Statistics; Web services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
  • ISSN
    1084-4627
  • Print_ISBN
    0-7695-2285-8
  • Type

    conf

  • DOI
    10.1109/ICDE.2005.39
  • Filename
    1410106