DocumentCode :
2221144
Title :
The Improved Pagerank in Web Crawler
Author :
Ling Zhang ; Zheng Qin
Author_Institution :
Dept. of Inf. Sci. & Eng., Normal Univ., Changsha, China
fYear :
2009
fDate :
26-28 Dec. 2009
Firstpage :
1889
Lastpage :
1892
Abstract :
Pagerank is an algorithm for rating web pages. It introduces the relationship of citation in academic papers to evaluate the web page´s authority. It gives the same weight to all edges and ignores the relevancy of web pages to the topic, resulting in a problem of topic-drift. On the analysis of several pagerank algorithms, an improved pagerank based upon thematic segments is proposed. In this algorithm, a web page is divided into several blocks by Html document´s structure and the most weight is given to linkages in the block that is most relevant to given topic. Moreover, the visited outlinks are regarded as feedback to modify blocks´ relevancy. The experiment on Web crawler shows that the new algorithm has some effect on resolving the problem of topic-drift.
Keywords :
Web sites; citation analysis; hypermedia markup languages; search engines; HTML document; Web crawler; Web page rating; academic paper; block relevancy; citation; pagerank; thematic segment; topic-drift; Algorithm design and analysis; Couplings; Crawlers; Feedback; HTML; Information science; Internet; Search engines; Uniform resource locators; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Engineering (ICISE), 2009 1st International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4909-5
Type :
conf
DOI :
10.1109/ICISE.2009.1220
Filename :
5455065
Link To Document :
بازگشت