DocumentCode :
2460835
Title :
The Research and Implementation of Parallel Web Crawler in Cluster
Author :
Wu, Min ; Lai, Junliang
Author_Institution :
Sch. of Comput. Sci., Southwest Petro Univ., Chengdu, China
fYear :
2010
fDate :
17-19 Dec. 2010
Firstpage :
704
Lastpage :
708
Abstract :
As the foundational component of web information acquisition, web crawler has been always the research hotspot in academia and industry, recently the parallel web crawler is the main research direction. In view of the shortage of the center-like dynamic assignment and distributed static assignment which are adopted by current parallel web crawler, this paper presents a parallel web crawler based on the cluster environment, which adopts the dynamic assignment structure, introduces the distributed controller pattern, eliminates the problem of single point failure in the central controller, and enhances system´s concurrent capability. In addition, it uses the URL dynamic assignment technology, and realizes dynamic balance among components according to their real-time condition.
Keywords :
Internet; information retrieval; URL dynamic assignment technology; Web information acquisition; center-like dynamic assignment structure; cluster environment; distributed controller pattern; distributed static assignment; parallel Web crawler; Control systems; Crawlers; Databases; Internet; Process control; Protocols; Web pages; Cluster; Dynamic assignment; Parallelism; Web crawler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational and Information Sciences (ICCIS), 2010 International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-8814-8
Electronic_ISBN :
978-0-7695-4270-6
Type :
conf
DOI :
10.1109/ICCIS.2010.175
Filename :
5709184
Link To Document :
بازگشت