Title :
Improvement of PageRank for Focused Crawler
Author :
Yuan, Fuyong ; Yin, Chunxia ; Liu, Jian
Author_Institution :
Yanshan Univ., Qinhuangdao
fDate :
July 30 2007-Aug. 1 2007
Abstract :
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers. Focused crawler is developed to collect relevant web pages of interested topics form the Internet. The PageRank algorithm is used in ranking web pages. It estimates the page \´s authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, we proposed an improved PageRank algorithm, which we called "T- PageRank", and it based on "topical random surfer". The experiment in focused crawler using the T-PageRank has better performance than the Breath-first and PageRank algorithms.
Keywords :
Internet; Web sites; information retrieval; Internet; PageRank algorithm; Web pages; Web pages ranking; World-Wide Web; focused crawler; general-purpose crawlers; topical random surfer; Artificial intelligence; Crawlers; Distributed computing; Educational institutions; Information science; Internet; Search engines; Software engineering; Uniform resource locators; Web pages; PageRank; T-PageRank; focused crawler; surfer; topical random;
Conference_Titel :
Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2007. SNPD 2007. Eighth ACIS International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-0-7695-2909-7
DOI :
10.1109/SNPD.2007.458