• DocumentCode
    3106598
  • Title

    A Regression Model-Based Approach to Accessing the Deep Web

  • Author

    Liu Jing

  • Author_Institution
    Coll. of Comput. Sci., South-Central Univ. for Nat., Wuhan, China
  • fYear
    2011
  • fDate
    16-18 Aug. 2011
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    An increasing number of data sources become available on the Web now, but often their contents are only accessible through query interfaces. For a domain of interest, accessing deep Web content has been a long-standing challenge. In this paper, we propose a deep Web crawling approach based on ordinal regression model. We divide page into 3 levels, and take the feedback of page classifier as an ordinal regression problem. We also take into account the interests of link delay; the related links are limited within 3 layers or less. Experiment results demonstrate that the feedback- based crawling strategy could effectively improve the crawling speed and accuracy.
  • Keywords
    Internet; Web sites; query processing; regression analysis; data sources; deep Web access; deep Web crawling; feedback; page classifier; query interfaces; regression model; Crawlers; Data mining; Databases; Feature extraction; Search engines; Training; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet Technology and Applications (iTAP), 2011 International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-7253-6
  • Type

    conf

  • DOI
    10.1109/ITAP.2011.6006322
  • Filename
    6006322