Title :
SiteRank-Based Crawling Ordering Strategy for Search Engines
Author :
Jiang, Qiancheng ; Zhang, Yan
Author_Institution :
Peking Univ., Beijing
Abstract :
Search engines are playing a more and more important role in discovering information nowadays. Due to limitations of time-consuming, network bandwidth and hardwares, we cannot obtain the whole information on the Web and have to download important information first. In this paper we propose a novel crawling ordering strategy which is based on SiteRank. Experimental results running on over 15 million pages indicate that it can work efficiently in discovering important pages under the PageRank evaluation of page quality. Furthermore, it exhibits the ability of anti-spamming.
Keywords :
Internet; search engines; PageRank evaluation; SiteRank; World Wide Web; anti-spamming; crawling ordering strategy; network bandwidth; page quality; search engines; Bandwidth; Crawlers; Hardware; History; Information retrieval; Information technology; Laboratories; Search engines; Uniform resource locators; Web pages;
Conference_Titel :
Computer and Information Technology, 2007. CIT 2007. 7th IEEE International Conference on
Conference_Location :
Aizu-Wakamatsu, Fukushima
Print_ISBN :
978-0-7695-2983-7
DOI :
10.1109/CIT.2007.35