DocumentCode :
3230326
Title :
A Memory-Efficient Strategy for Exploring the Web
Author :
Castillo, Carlos ; Nelli, Alberto ; Panconesi, Alessandro
Author_Institution :
Univ. di Roma "La Sapienza", Rome
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
680
Lastpage :
686
Abstract :
Search engines rely on Web crawlers to create an index of the Web. Web crawlers explore the Web downloading pages and finding links to new pages to be explored. At any given moment, there are a number of pages waiting to be downloaded in the crawler queue. We study the growth of this queue of pending pages during a crawl of a large subset of the Web. In a normal breadth-first crawler, the queue quickly grows very large. We present a strategy for managing the pending queue that reduces its maximum size by 50% while preserving the coverage and quality of the pages visited. This can be applied to general purpose Web crawlers as well as topic-specific crawling, peer-to-peer search, on-demand Web crawling, and other environments in which memory usage has to be kept to a minimum
Keywords :
Internet; search engines; Web Search engines; Web crawlers; Web downloading pages; Web index; crawler queue; memory usage; memory-efficient strategy; normal breadth-first crawler; on-demand Web crawling; peer-to-peer search; pending queue management; topic-specific crawling; Crawlers; Large-scale systems; Peer to peer computing; Quality management; Remuneration; Search engines; Visualization; Web pages; Web search; Web server;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7
Type :
conf
DOI :
10.1109/WI.2006.18
Filename :
4061453
Link To Document :
بازگشت