• DocumentCode
    3424041
  • Title

    Automatic Generation of Ontology from the Deep Web

  • Author

    An, Yoo Jung ; Geller, James ; Wu, Yi-ta ; Chun, Soon Ae

  • Author_Institution
    New Jersey Inst. of Technol., Newark
  • fYear
    2007
  • fDate
    3-7 Sept. 2007
  • Firstpage
    470
  • Lastpage
    474
  • Abstract
    The term "deep Web" refers to Web pages that are not accessible to search engines, e.g., because those Web pages are dynamically generated in response to queries through Web forms or Web services. The existing automated Web crawlers cannot index these pages, thus they are hidden from the Web search engines. Our goal is to properly annotate such deep Web services (i.e. content generation interfaces of hidden Web sources) with semantic indexing by constructing domain-specific ontologies to represent the contents of the deep Web sources. The fully automatic derivation of ontologies from Web sources without human review is to date a challenging research issue. We present a novel approach to automatically building a large, yet domain-specific, ontology by interweaving sub-taxonomies of WordNet with domain-specific information extracted from deep Web service pages. Our algorithms extract domain concepts from deep Web sources which are augmented with concepts and relationships from WordNet to construct ontology fragments. Structurally, these are directed acyclic graphs (DAGs). An iterative process of extracting WordNet concepts and relationships and bridging concept gaps is used to tie together disparate domain concepts and ontology fragments into one ontology. Using eight domains (airfares, jobs, etc.) from a well-known test-bed, our algorithms constructed an ontology of 1692 concepts from deep Web sources and 4434 concepts from WordNet. This ontology is expressed in the OWL format to support semantic Web searches.
  • Keywords
    Web services; directed graphs; indexing; iterative methods; ontologies (artificial intelligence); search engines; OWL format; Web crawlers; Web forms; Web pages; WordNet concepts extraction; deep Web services; directed acyclic graphs; domain-specific ontologies; iterative process; ontology generation; search engines; semantic Web searching; semantic indexing; Crawlers; Data mining; Humans; Indexing; Iterative algorithms; Ontologies; Search engines; Web pages; Web search; Web services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2007. DEXA '07. 18th International Workshop on
  • Conference_Location
    Regensburg
  • ISSN
    1529-4188
  • Print_ISBN
    978-0-7695-2932-5
  • Type

    conf

  • DOI
    10.1109/DEXA.2007.43
  • Filename
    4312938