DocumentCode :
189289
Title :
Comparison of Scheduling Algorithms for Domain Specific Web Crawler
Author :
Filipowski, Krzysztof
Author_Institution :
Dept. of Comput. Syst. & Networks, Wroclaw Univ. of Technol., Wroclaw, Poland
fYear :
2014
fDate :
29-30 Sept. 2014
Firstpage :
69
Lastpage :
74
Abstract :
Domain-specific Web crawlers are effective tools for acquiring information from the Web. One of the most crucial factors influencing the efficiency of domain crawlers is choice of crawling strategy. This article describes and compares several strategies for domain specific Web crawling. It concentrates particularly on scheduling algorithms which determine order of crawling URLs collected by the crawler. The objective of these strategies is to download the most relevant Web pages in an early stage of the crawl. In the paper there are presented four different algorithms which are compared using several metrics.
Keywords :
Internet; Web sites; information retrieval; scheduling; Web pages; domain specific Web crawler; information retrieval; scheduling algorithms; Algorithm design and analysis; Crawlers; Internet; Search engines; Search problems; Uniform resource locators; Web pages; Best N-First Search; Best-First Search; Domain Specific Crawling; Exploration; Information Retrieval;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Intelligence Conference (ENIC), 2014 European
Conference_Location :
Wroclaw
Type :
conf
DOI :
10.1109/ENIC.2014.14
Filename :
6984893
Link To Document :
بازگشت