• DocumentCode
    3177214
  • Title

    Instance Discovery and Schema Matching with Applications to Biological Deep Web Data Integration

  • Author

    Liu, Tantan ; Wang, Fan ; Agrawal, Gagan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
  • fYear
    2010
  • fDate
    May 31 2010-June 3 2010
  • Firstpage
    304
  • Lastpage
    305
  • Abstract
    We presents data mining-based techniques for enabling data integration across deep web data sources. We target query processing across inter-dependent data sources. Thus, besides input-input and output-output matching of attributes, we also need to consider input-output matching. We develop data mining techniques for discovering the instances for querying deep web data sources from the information provided by the query interfaces themselves, as well as from the obtained output pages of the related data sources, by query probing using dynamically identified input instances. Then, using a hierarchical representation of schemas and by applying clustering techniques, we are able to generate schema matches. We show the effectiveness of our technique while integrating 24 query interfaces.
  • Keywords
    Internet; bioinformatics; data mining; pattern clustering; pattern matching; query processing; biological deep Web data integration; clustering techniques; data mining-based techniques; deep Web data sources; input-input attribute matching; input-output attribute matching; interdependent data sources; output-output attribute matching; query interfaces; query probing; query processing; schema matching; Amino acids; Application software; Bioinformatics; Biological information theory; Biomedical engineering; Data mining; Databases; Humans; Impedance matching; Proteins; Deep Web; Schema Matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    BioInformatics and BioEngineering (BIBE), 2010 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4244-7494-3
  • Type

    conf

  • DOI
    10.1109/BIBE.2010.65
  • Filename
    5521664