DocumentCode
468231
Title
An Application of Improved PageRank in Focused Crawler
Author
Zhang, Yulian ; Yin, Chunxia ; Yuan, Fuyong
Author_Institution
Yanshan Univ., Qinhuangdao
Volume
2
fYear
2007
fDate
24-27 Aug. 2007
Firstpage
331
Lastpage
335
Abstract
The focused crawler of a special-purpose search engine aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. The PageRank algorithm is often used in ranking web pages, and it is also used in URL ordering for focused crawler. It estimates the page\´s authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, we propose an improved PageRank algorithm, which we called "To-PageRank", and then we present a crawling strategy using the To-PageRank algorithm combining with the topic similarity of the hyperlink metadata. The experiment in focused crawler shows that the new improved crawling strategy has better performance than the Breath-first and PageRank algorithms.
Keywords
Internet; meta data; search engines; PageRank algorithm; URL ordering; Web pages ranking; focused crawler; hyperlink metadata; special-purpose search engine; Couplings; Crawlers; Educational institutions; Explosives; Information science; Power engineering and energy; Search engines; Uniform resource locators; Web pages; Web search;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location
Haikou
Print_ISBN
978-0-7695-2874-8
Type
conf
DOI
10.1109/FSKD.2007.142
Filename
4406097
Link To Document