• DocumentCode
    2491750
  • Title

    Domain-oriented Deep Web Data Sources´ Discovery and Identification

  • Author

    Li, Yingjun ; Nie, Tiezheng ; Shen, Derong ; Yu, Ge

  • Author_Institution
    Inst. of Comput. Software & Theor., Northeastern Univ. Shenyang, Shenyang, China
  • fYear
    2010
  • fDate
    6-8 April 2010
  • Firstpage
    464
  • Lastpage
    467
  • Abstract
    As Deep Web contains tremendous well-structured data sources, how to integrate data sources in Deep Web has become a hotspot in current research. Accurately discovering and identifying Deep Web data sources related to a specific domain become key issues. We propose a Domain-Oriented Deep Web data source Discovery method (DO-DWD) and a novel Domain Identification strategy of Deep Web data sources (DIDW). In the discovery stage, we use machine learning algorithms and some heuristic rules to find query interfaces of the data sources; In the identification stage, we identify Deep Web data sources associated with the domain by calculating the relevance between a query interface and the domain based on semantic similarity. Finally, we have extensive experiments on a real data set showing that DO-DWD and DIDW are of high correctness and accuracy.
  • Keywords
    Internet; learning (artificial intelligence); query processing; user interfaces; data integration; data sources; domain identification strategy; domain-oriented deep Web data source discovery; machine learning algorithms; query interfaces; semantic similarity; Data engineering; Data mining; Databases; Educational institutions; Information science; Internet; Machine learning algorithms; Probes; Radio control; Software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Conference (APWEB), 2010 12th International Asia-Pacific
  • Conference_Location
    Busan
  • Print_ISBN
    978-1-7695-4012-2
  • Electronic_ISBN
    978-1-4244-6600-9
  • Type

    conf

  • DOI
    10.1109/APWeb.2010.54
  • Filename
    5474088