DocumentCode :
1827965
Title :
Modelling on web dynamic incremental crawling and information processing
Author :
Kai Gao ; Wei Wang ; Shen Gao
Author_Institution :
Sch. of Inf. Sci. & Eng., Hebei Univ. of Sci. & Technol., Shijiazhuang, China
fYear :
2013
fDate :
Aug. 31 2013-Sept. 2 2013
Firstpage :
293
Lastpage :
298
Abstract :
The amount of web information is increasing rapidly, and it is continuously being produced and updated in anywhere and anytime by means of Internet and social networks. As for a search engine, keeping up with the evolving web is necessary. How to model the change and which part should be updated more often? Towards this goal, this paper presents the modeling on dynamic web evolution and incremental crawling strategy, and concerns about the refresh interval with minimum waiting time. As a result, the crawling probability on some sites is higher than others so these sites will be given more opportunities to be updated. Based on the web site priority level adjusted algorithm, the dynamic web information gathering strategy is proposed. Through monitoring the proposed metrics, the web site priority level can be dynamically adjusted. It is essential when the bandwidth is not wide enough or the resource is limited. Further, some strategies on web information extraction and processing are also present. The experimental results validate the feasibility of the approach.
Keywords :
Internet; information retrieval; search engines; social networking (online); Internet; Web dynamic incremental crawling; Web site priority level adjusted algorithm; crawling probability; dynamic Web evolution; dynamic Web information gathering strategy; incremental crawling strategy; information processing; search engine; social networks; Ink; Monitoring; Yttrium; Search engine; crawler; information extraction; refresh;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Modelling, Identification & Control (ICMIC), 2013 Proceedings of International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-0-9567157-3-9
Type :
conf
Filename :
6642201
Link To Document :
بازگشت