Title :
An Improved Shark-Search Algorithm Based on Multi-information
Author :
Chen, Zhumin ; Ma, Jun ; Lei, JingSheng ; Yuan, Bo ; Lian, Li
Author_Institution :
Shandong Univ., Jinan
Abstract :
With the enormous growth of world wide web, existing general-purpose search engines have presented much more limitations. Focused crawling is increasingly seen as a potential solution. The key of focused crawling is how to accurately predict the relevance of the unvisited web pages pointed to by known URLs to a given topic. A formalized description of the predicting process is introduced. Then, four policies are proposed to predict the relevance of unvisited pages to a topic. Further the combinations of these policies are used to improve the Shark-Search, which is a classic focused crawling algorithm mainly based on the textual information of Web pages. A large number of experiments were carried out to identify the optimized combination and verify that the improved Shark-Search is more effective than the original one.
Keywords :
Internet; search engines; Web pages; World Wide Web; focused crawling; general-purpose search engines; improved shark-search algorithm; multiinformation; textual information; Computer science; Crawlers; Educational institutions; Heuristic algorithms; Information science; Marine animals; Search engines; Uniform resource locators; Web pages; Web sites;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
Conference_Location :
Haikou
Print_ISBN :
978-0-7695-2874-8
DOI :
10.1109/FSKD.2007.166