• DocumentCode
    2253106
  • Title

    Autonomic Wrapper Induction using Minimal Type System from Web Data

  • Author

    Son, Youngju ; Jamil, Hasan ; Fotouhi, Farshad

  • Author_Institution
    Comput. Sci., Wayne State Univ., Detroit, MI
  • fYear
    2005
  • fDate
    5-8 Dec. 2005
  • Firstpage
    130
  • Lastpage
    135
  • Abstract
    Biological and genomic source integration has become a major research field. Most of biological data has been provided over the Web. This Web data is unstructured and cannot be queried using traditional querying language. Furthermore, the problems that integration of biological data faces come from several factors such as the various data types, presentations and formats. So, it is not easy to find the desired data from diverse data sources. Although humans can easily understand Web data, which are heterogeneous and unstructured, it is impossible for machine itself to figure it out. In order for machine to extract data from the Web, it requires knowledge of both their structures and contents. We propose a novel architecture for automatic wrapper induction that exploits a user supplied type system and an ontology for establishing schema correspondence precisely and efficiently. In this paper, the type system helps recognize target data and improves precision of schema matching which is impossible without manual intervention
  • Keywords
    Internet; biology computing; information retrieval; ontologies (artificial intelligence); Web data extraction; autonomic wrapper induction; biological source integration; genomic source integration; minimal type system; ontology; Computer science; Costs; Data mining; Electronic mail; Engines; Genomics; HTML; Induction generators; Mediation; Ontologies; Information Extraction; Type Hierarchy;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Artificial intelligence, 2005. epia 2005. portuguese conference on
  • Conference_Location
    Covilha
  • Print_ISBN
    0-7803-9366-X
  • Electronic_ISBN
    0-7803-9366-X
  • Type

    conf

  • DOI
    10.1109/EPIA.2005.341280
  • Filename
    4145939