DocumentCode :
492503
Title :
A Framework of a Hybrid Focused Web Crawler
Author :
Sun, Yixue ; Jin, Peiquan ; Yue, Lihua
Author_Institution :
Dept. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China
Volume :
2
fYear :
2008
fDate :
13-15 Dec. 2008
Firstpage :
50
Lastpage :
53
Abstract :
Because of the complex Web structure, most approaches of focused crawling employ a local search algorithm, which will only search pages in a sub-graph of the Web. And the multi-topic feature of Web pages makes it difficult to determine the relevance of a Web page to a given topic. Towards those two issues, in this paper we present a new hybrid approach to focused crawling, which is based on meta-search and VIPS (VIsion based Page Segmentation) algorithm. We use meta-search to achieve a wider crawling range than traditional local search algorithm. Besides, in order to obtain better recall and precision, we use VIPS-based algorithm for the relevance computation of a Web page, which first partitions a Web page into a set of blocks that reflect the semantic structure of the page. The system architecture of hybrid focused crawler is discussed after a short review on related work, and then we present the framework of the hybrid focused crawling approach.
Keywords :
Internet; query formulation; VIPS algorithm; Web pages; Web sub-graph; hybrid focused Web crawler; local search algorithm; meta-search; page semantic structure; vision based page segmentation algorithm; Algorithm design and analysis; Conferences; Crawlers; HTML; Hybrid power systems; Metasearch; Partitioning algorithms; Performance analysis; Uniform resource locators; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future Generation Communication and Networking Symposia, 2008. FGCNS '08. Second International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-3430-5
Electronic_ISBN :
978-0-7695-3546-3
Type :
conf
DOI :
10.1109/FGCNS.2008.73
Filename :
4813520
Link To Document :
بازگشت