• DocumentCode
    2162524
  • Title

    Automate discovery of deep web interfaces

  • Author

    Du, Xin ; Zheng, Yongqing ; Yan, Zhongmin

  • Author_Institution
    School of Computer Science and Technology, Shandong University, Jinan, China
  • fYear
    2010
  • fDate
    4-6 Dec. 2010
  • Firstpage
    3572
  • Lastpage
    3575
  • Abstract
    With the rapid increase of web sources, more and more deep web databases become available. The information in these databases can only be accessed by submitting queries to back-end databases. However, the traditional search engine interfaces resemble extremely deep web interfaces. Therefore, it is difficult to distinguish them and to find deep web interfaces. This paper proposes a novel method of discovering deep web interfaces. We introduce a page division method to divide pages into separate parts. After that we remove the parts which don´t contain search interfaces. At last we construct topic-specific queries to obtain results and distinguish deep web interfaces by analyzing the results. Experiment result shows that this method is effective and stable.
  • Keywords
    Accuracy; Crawlers; Databases; HTML; Layout; Web pages; Deep Web; Interface Extraction; Tag Trees;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Engineering (ICISE), 2010 2nd International Conference on
  • Conference_Location
    Hangzhou, China
  • Print_ISBN
    978-1-4244-7616-9
  • Type

    conf

  • DOI
    10.1109/ICISE.2010.5691802
  • Filename
    5691802