DocumentCode :
2909926
Title :
A GNP-Based Scheduling Strategy for Distributed Crawling
Author :
Liu, Shuang ; Xu, Xiao ; Li, Dong ; Zhang, Wei-Zhe ; Liu, Xin-Ran
Author_Institution :
Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
fYear :
2009
fDate :
7-8 Nov. 2009
Firstpage :
651
Lastpage :
655
Abstract :
In order to solve task scheduling and load balancing problems of distributed search engines, a GNP-based scheduling strategy for distributed crawling and a load balancing method are proposed in this paper. Internet distance estimating mechanism is adopted as a replacement for large-scale network distance measurement, which not only improves response speed of the system, but also reduces loads on WAN caused by the system. Through deploying crawling nodes at WANs, we built a distributed search engine, and implemented several scheduling strategies. The online experiment shows great improvement in system´s performance.
Keywords :
Internet; resource allocation; scheduling; search engines; GNP-based scheduling strategy; Internet distance estimating mechanism; WAN; distributed crawling; distributed search engines; global network positioning; large-scale network distance measurement; load balancing; task scheduling; Computer science; Educational institutions; Fault tolerant systems; History; Information systems; Load management; Logic; Peer to peer computing; Routing; Scalability; GNP; distributed crawling; load balancing; network measurement; scheduling strategies;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems and Mining, 2009. WISM 2009. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3817-4
Type :
conf
DOI :
10.1109/WISM.2009.136
Filename :
5369005
Link To Document :
بازگشت