DocumentCode :
2653732
Title :
A Comprehensive Prediction Method of Visit Priority for Focused Crawler
Author :
Li, Xueming ; Xing, Minling ; Zhang, Jiapei
Author_Institution :
Coll. of Comput. Sci., Chongqing Univ., Chongqing, China
fYear :
2011
fDate :
22-23 Oct. 2011
Firstpage :
27
Lastpage :
30
Abstract :
The purpose of a focused crawler is to crawl more topical portions of the Internet precisely. How to predict the visit priorities of candidate URLs whose corresponding pages have yet to be fetched is the determining factor in the focused crawler´s ability of getting more relevant pages. This paper introduces a comprehensive prediction method to address this problem. In this method, a page partition algorithm that partitions the page into smaller blocks and interclass rules that statistically capture linkage relationships among the topic classes are adopted to help the focused crawler cross tunnel and to enlarge the focused crawler´s coverage, URL´s address, anchor text and block content are used to predict visit priority more precisely. Experiments are carried out on the target topic of tennis and the results show that crawler based on this method is more effective than a rule-based crawler on harvest ratio.
Keywords :
Internet; information retrieval; search engines; Internet; URL address; anchor text; block content; candidate URL; comprehensive prediction method; focused crawler coverage; harvest ratio; interclass rules; page partition algorithm; rule-based crawler; tennis; visit priority; Crawlers; Educational institutions; Partitioning algorithms; Search engines; Support vector machine classification; Training; Vectors; focused crawler; interclass rules; page partition; visit priority;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence Information Processing and Trusted Computing (IPTC), 2011 2nd International Symposium on
Conference_Location :
Hubei
Print_ISBN :
978-1-4577-1130-5
Type :
conf
DOI :
10.1109/IPTC.2011.14
Filename :
6103528
Link To Document :
بازگشت