• DocumentCode
    2291838
  • Title

    A probabilistic model for intelligent Web crawlers

  • Author

    Hu, Ke ; Wong, Wing Shing

  • Author_Institution
    Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin, China
  • fYear
    2003
  • fDate
    3-6 Nov. 2003
  • Firstpage
    278
  • Lastpage
    282
  • Abstract
    With the enormous growth of the World Wide Web in recent years, the issue of how to discover Web pages efficiently has become an important challenge for Web crawler designers. In this paper, we will outline a simple model to predict the distribution of the search depth in a breadth-first search to reach the first Web pages relevant to a user query. We define this probability as the crawler confidence. Recent studies by Y. Deshpande and S. Hansen (2001) indicate that at a large scale the Web structure subscribes to power law distribution on several aspects. However, our work tries to model a microscopic linkage structure of the Web from an intelligent crawler´s point of view. With the information provided by crawler confidence, an intelligent crawler can adjust its crawling behavior to achieve a higher harvest rate.
  • Keywords
    Internet; Web design; data mining; probability; search engines; Web crawler design; Web crawlers; Web pages; Web structure; World Wide Web; breadth-first search; intelligent crawler; microscopic linkage structure; power law distribution; probabilistic model; search engines; user querying; Couplings; Crawlers; Databases; Design engineering; Indexing; Intelligent robots; Intelligent structures; Search engines; Web pages; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International
  • ISSN
    0730-3157
  • Print_ISBN
    0-7695-2020-0
  • Type

    conf

  • DOI
    10.1109/CMPSAC.2003.1245354
  • Filename
    1245354