DocumentCode
1666167
Title
Research on a dynamic adjust crawling algorithm for guiding the topic crawler through Tunnels
Author
Xu, Chang ; Jian-guo, Xu ; Bin, Jia
Author_Institution
College of Information and Engineering Shan Dong University of Science and Technology Qingdao, China
fYear
2011
Firstpage
1
Lastpage
4
Abstract
The problem of Tunnels is always the focus of topic crawler. Based on the study of VSM, the paper added the impact of the text structure of web documents to the topic similarity, improved VSM text classification algorithm to make the prediction more accurate, and applied to the dynamic adjustment topic crawler algorithm through the tunnel. By analyzing the influence by features of Web Community and tunneling, taking the genetic factors of parent page and child pages into account, applied to the web page similarity calculation. In order to improve the shortcomings of the traditional tunnel method, this paper designed a new algorithm to make crawler dynamically adjust the K values according to the corresponding calculated strategy during crawling the pages, Making Web Community and tunnels to form a relatively complete thematic clusters to improve the web crawl rate.
Keywords
Classification algorithms; Communities; Crawlers; Educational institutions; Heuristic algorithms; Prediction algorithms; Text categorization; Topic crawler; Topic similarity; Turnning; VSM; Web Community;
fLanguage
English
Publisher
ieee
Conference_Titel
E -Business and E -Government (ICEE), 2011 International Conference on
Conference_Location
Shanghai, China
Print_ISBN
978-1-4244-8691-5
Type
conf
DOI
10.1109/ICEBEG.2011.5884527
Filename
5884527
Link To Document