DocumentCode :
2427815
Title :
An Improved Shark-Search Algorithm Based on Multi-information
Author :
Chen, Zhumin ; Ma, Jun ; Lei, JingSheng ; Yuan, Bo ; Lian, Li
Author_Institution :
Shandong Univ., Jinan
Volume :
4
fYear :
2007
fDate :
24-27 Aug. 2007
Firstpage :
659
Lastpage :
658
Abstract :
With the enormous growth of world wide web, existing general-purpose search engines have presented much more limitations. Focused crawling is increasingly seen as a potential solution. The key of focused crawling is how to accurately predict the relevance of the unvisited web pages pointed to by known URLs to a given topic. A formalized description of the predicting process is introduced. Then, four policies are proposed to predict the relevance of unvisited pages to a topic. Further the combinations of these policies are used to improve the Shark-Search, which is a classic focused crawling algorithm mainly based on the textual information of Web pages. A large number of experiments were carried out to identify the optimized combination and verify that the improved Shark-Search is more effective than the original one.
Keywords :
Internet; search engines; Web pages; World Wide Web; focused crawling; general-purpose search engines; improved shark-search algorithm; multiinformation; textual information; Computer science; Crawlers; Educational institutions; Heuristic algorithms; Information science; Marine animals; Search engines; Uniform resource locators; Web pages; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-2874-8
Type :
conf
DOI :
10.1109/FSKD.2007.166
Filename :
4406469
Link To Document :
بازگشت