• DocumentCode
    2655111
  • Title

    A framework of deep Web crawler

  • Author

    Peisu, Xiang ; Ke, Tian ; Qinzhen, Huang

  • Author_Institution
    Coll. of Electr. Inf. Eng., Southwest Univ. for Nat., Chengdu
  • fYear
    2008
  • fDate
    16-18 July 2008
  • Firstpage
    582
  • Lastpage
    586
  • Abstract
    As an ever-increasing amount of information on the Web today is available through search interfaces, users have to key in a set of keywords in order to access the pages from certain Web sites, which are often referred to as the hidden Web or the deep Web. Since there is no static links to the hidden Web pages, search engines cannot discover and index such pages. However, according to recent studies, the content provided by many hidden Web sites is often of very high quality and can be extremely valuable to many users. How to build an effective hidden Web crawler that can autonomously discover and download pages from the hidden Web is studied. A framework of deep Web crawler is provided and we propose novel techniques to handle the actual mechanics of crawling the deep Web. Experiment shows that these policies are effective.
  • Keywords
    Web sites; search engines; Web page; Web site; content management; deep Web crawler framework; search engine; user interface; Buildings; Crawlers; Databases; Educational institutions; Electronic mail; Fuzzy sets; Search engines; Service oriented architecture; Web pages; Deep Web; Deep Web Crawler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control Conference, 2008. CCC 2008. 27th Chinese
  • Conference_Location
    Kunming
  • Print_ISBN
    978-7-900719-70-6
  • Electronic_ISBN
    978-7-900719-70-6
  • Type

    conf

  • DOI
    10.1109/CHICC.2008.4604881
  • Filename
    4604881