Title :
The Research and Implementation of Parallel Web Crawler in Cluster
Author :
Wu, Min ; Lai, Junliang
Author_Institution :
Sch. of Comput. Sci., Southwest Petro Univ., Chengdu, China
Abstract :
As the foundational component of web information acquisition, web crawler has been always the research hotspot in academia and industry, recently the parallel web crawler is the main research direction. In view of the shortage of the center-like dynamic assignment and distributed static assignment which are adopted by current parallel web crawler, this paper presents a parallel web crawler based on the cluster environment, which adopts the dynamic assignment structure, introduces the distributed controller pattern, eliminates the problem of single point failure in the central controller, and enhances system´s concurrent capability. In addition, it uses the URL dynamic assignment technology, and realizes dynamic balance among components according to their real-time condition.
Keywords :
Internet; information retrieval; URL dynamic assignment technology; Web information acquisition; center-like dynamic assignment structure; cluster environment; distributed controller pattern; distributed static assignment; parallel Web crawler; Control systems; Crawlers; Databases; Internet; Process control; Protocols; Web pages; Cluster; Dynamic assignment; Parallelism; Web crawler;
Conference_Titel :
Computational and Information Sciences (ICCIS), 2010 International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-8814-8
Electronic_ISBN :
978-0-7695-4270-6
DOI :
10.1109/ICCIS.2010.175