• DocumentCode
    3045395
  • Title

    A sample-guided approach to incremental structured web database crawling

  • Author

    Liu, Wei ; Xiao, Jianguo ; Yang, Jianwu

  • Author_Institution
    Key Lab. of Comput. Linguistics, Peking Univ., Beijing, China
  • fYear
    2010
  • fDate
    20-23 June 2010
  • Firstpage
    890
  • Lastpage
    895
  • Abstract
    Web database crawling is a promising solution for Deep Web data integration. To the best of our knowledge, the existing approaches only focused on how to crawl all records in a web database. Due to the high dynamic of most web databases, it is not practical to harvest a small proportion of new records by crawling the whole database. This paper studies the problem of incremental web database crawling, which targets at crawling the new records from a web database efficiently. In the proposed approach, a new graph model, query related graph, is proposed to transform a incremental crawling task into a graph traversal process. Based on this graph model, appropriate queries are generated for crawling which are guided by the samples of the web database. Extensive experimental evaluations over real Web databases validate the effectiveness of our techniques and provide insights for future efforts in this direction.
  • Keywords
    Internet; data handling; database management systems; graph theory; Deep Web data integration; graph traversal process; incremental structured Web database crawling; query related graph; sample guided approach; Automation; Computational linguistics; Computer science; Crawlers; Data analysis; Databases; Hardware; History; Internet; Laboratories; Deep Web data integration; Web database; Web database crawling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Automation (ICIA), 2010 IEEE International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4244-5701-4
  • Type

    conf

  • DOI
    10.1109/ICINFA.2010.5512131
  • Filename
    5512131