DocumentCode :
3497023
Title :
URL Assignment Algorithm of Crawler in Distributed System Based on Hash
Author :
Wan, Yuan ; Tong, Hengqing
Author_Institution :
Wuhan Univ. of Technol., Xiamen
fYear :
2008
fDate :
6-8 April 2008
Firstpage :
1632
Lastpage :
1635
Abstract :
Web crawlers are the key component of services running on Internet and providing searching and indexing support for the entire Web, for corporate Intranets and large portal sites. More recently, crawlers have also been used as tools to conduct focused Web searches and to gather data about the characteristics of the WWW. In this paper, we research on the gathering model of crawler in the distributed circumstance. We describe the function of every module and establish some rules which crawlers must follow to maintain the equilibrium load and robustness of system when they are searching on the Web simultaneously. Then we design and implement a new URL assignment algorithm based on hash for partitioning the domain to crawl, and more in general discuss the complete decentralization of every task.
Keywords :
Internet; cryptography; intranets; Internet; Web; corporate Intranets; crawler URL assignment algorithm; distributed system; Algorithm design and analysis; Crawlers; Indexing; Partitioning algorithms; Portals; Robustness; Uniform resource locators; Web and internet services; Web search; World Wide Web; Hash algorithm; URL assignment; distributed crawler; gathering model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking, Sensing and Control, 2008. ICNSC 2008. IEEE International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-1685-1
Electronic_ISBN :
978-1-4244-1686-8
Type :
conf
DOI :
10.1109/ICNSC.2008.4525482
Filename :
4525482
Link To Document :
بازگشت