DocumentCode
2291838
Title
A probabilistic model for intelligent Web crawlers
Author
Hu, Ke ; Wong, Wing Shing
Author_Institution
Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin, China
fYear
2003
fDate
3-6 Nov. 2003
Firstpage
278
Lastpage
282
Abstract
With the enormous growth of the World Wide Web in recent years, the issue of how to discover Web pages efficiently has become an important challenge for Web crawler designers. In this paper, we will outline a simple model to predict the distribution of the search depth in a breadth-first search to reach the first Web pages relevant to a user query. We define this probability as the crawler confidence. Recent studies by Y. Deshpande and S. Hansen (2001) indicate that at a large scale the Web structure subscribes to power law distribution on several aspects. However, our work tries to model a microscopic linkage structure of the Web from an intelligent crawler´s point of view. With the information provided by crawler confidence, an intelligent crawler can adjust its crawling behavior to achieve a higher harvest rate.
Keywords
Internet; Web design; data mining; probability; search engines; Web crawler design; Web crawlers; Web pages; Web structure; World Wide Web; breadth-first search; intelligent crawler; microscopic linkage structure; power law distribution; probabilistic model; search engines; user querying; Couplings; Crawlers; Databases; Design engineering; Indexing; Intelligent robots; Intelligent structures; Search engines; Web pages; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International
ISSN
0730-3157
Print_ISBN
0-7695-2020-0
Type
conf
DOI
10.1109/CMPSAC.2003.1245354
Filename
1245354
Link To Document