Title :
A Scalable Lightweight Distributed Crawler for Crawling with Limited Resources
Author :
Kc, Milly ; Hagenbuchner, Markus ; Tsoi, Ah Chung
Author_Institution :
Univ. of Wollongong, Wollongong, NSW
Abstract :
Web page crawlers are an essential component in a number of Web applications. The sheer size of the Internet can pose problems in the design of Web crawlers. All currently known crawlers implement approximations or have limitations so as to maximize the throughput of the crawl, and hence, maximize the number of pages that can be retrieved within a given time frame. This paper proposes a distributed crawling concept which is designed to avoid approximations, to limit the network overhead, and to run on relatively inexpensive hardware. A set of experiments, and comparisons highlight the effectiveness of the proposed approach.
Keywords :
Internet; Web sites; information retrieval; Internet; Web page crawlers; distributed crawling concept; limited resources; network overhead; scalable lightweight distributed crawler; Australia; Bandwidth; Crawlers; Hardware; Information retrieval; Intelligent agent; Internet; Throughput; Web pages; Web search; Distributed Crawler; Web crawler; complete crawl;
Conference_Titel :
Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-0-7695-3496-1
DOI :
10.1109/WIIAT.2008.234