DocumentCode :
3354665
Title :
PPSpider: Towards an Efficient and Robust Topic-Specific Crawler Based on Peer-to-Peer Network
Author :
Liu, Dongfei ; Liu, Jia
Author_Institution :
Dept. of Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan, China
Volume :
1
fYear :
2009
fDate :
28-30 Oct. 2009
Firstpage :
101
Lastpage :
105
Abstract :
To satisfy users in specific area, topic-specific Web crawler is becoming more and more popular in the Web data mining and searching with Internet growing exponentially. Meanwhile, a few peer-to-peer based Web search engines are proposed to cope with problems like single point of failure of current centralized architecture crawlers. Relevance of topic has been deeply studied while selection of starting URLs has no significant progress for a long time. This paper introduces ¿PPSpider¿ to achieve high efficiency of Web crawling for peer-to-peer based topic-specific crawler. Each single peer of PPSpider leverages local browsed Web pages as candidates of starting URLs to mitigate the tunnel problem which troubles current Web crawler. To adapt to dynamic network environment, corresponding crawling algorithm and URL transmitting mechanisms were implemented for PPSpider to help accelerating joint crawling process and reducing network communication flow among peers. Evaluation result shows the benefits of PPSpider and existing relevance algorithms can be easily deployed on PPSpider so as to further improve crawling efficiency.
Keywords :
Internet; data mining; peer-to-peer computing; search engines; software tools; Internet; PPSpider leverages local browsed Web pages; Web data mining; centralized architecture crawlers; dynamic network environment; joint crawling process acceleration; network communication flow; peer-to-peer based Web search engines; peer-to-peer network; single point of failure; topic-specific Web crawler; tunnel problem; Crawlers; Data mining; Internet; Peer to peer computing; Robustness; Search engines; Service oriented architecture; Uniform resource locators; Web pages; Web search; DHT; Peer-to-Peer; Starting URL; Topic-Specific; Web Crawler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Engineering, 2009. WCSE '09. Second International Workshop on
Conference_Location :
Qingdao
Print_ISBN :
978-0-7695-3881-5
Type :
conf
DOI :
10.1109/WCSE.2009.631
Filename :
5403448
Link To Document :
بازگشت