DocumentCode
189289
Title
Comparison of Scheduling Algorithms for Domain Specific Web Crawler
Author
Filipowski, Krzysztof
Author_Institution
Dept. of Comput. Syst. & Networks, Wroclaw Univ. of Technol., Wroclaw, Poland
fYear
2014
fDate
29-30 Sept. 2014
Firstpage
69
Lastpage
74
Abstract
Domain-specific Web crawlers are effective tools for acquiring information from the Web. One of the most crucial factors influencing the efficiency of domain crawlers is choice of crawling strategy. This article describes and compares several strategies for domain specific Web crawling. It concentrates particularly on scheduling algorithms which determine order of crawling URLs collected by the crawler. The objective of these strategies is to download the most relevant Web pages in an early stage of the crawl. In the paper there are presented four different algorithms which are compared using several metrics.
Keywords
Internet; Web sites; information retrieval; scheduling; Web pages; domain specific Web crawler; information retrieval; scheduling algorithms; Algorithm design and analysis; Crawlers; Internet; Search engines; Search problems; Uniform resource locators; Web pages; Best N-First Search; Best-First Search; Domain Specific Crawling; Exploration; Information Retrieval;
fLanguage
English
Publisher
ieee
Conference_Titel
Network Intelligence Conference (ENIC), 2014 European
Conference_Location
Wroclaw
Type
conf
DOI
10.1109/ENIC.2014.14
Filename
6984893
Link To Document