DocumentCode
2784159
Title
Focused crawler URL analysis model based on improved genetic algorithm
Author
Ning, Hui ; Wu, Hao ; He, Zhongzheng ; Tan, Yazhou
Author_Institution
Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
fYear
2011
fDate
7-10 Aug. 2011
Firstpage
2159
Lastpage
2164
Abstract
This paper analyses the URL analysis models of the existing focused crawler, and also their pros and cons, then we propose a URL analysis model based on the improved genetic algorithm, in which the selection operator, crossover operator and mutation operator are optimized. The user query is introduced to construct the virtual documents to participate the genetic process. The Rocchio feedback learning algorithm is used to amend the theme vector, and also to compute the relevant degree of the themes for the anchor text. The experiment shows that the improved generic algorithm can effectively collect the topic page.
Keywords
Internet; genetic algorithms; learning (artificial intelligence); mathematical operators; Rocchio feedback learning algorithm; URL analysis model; crossover operator; focused crawler; genetic algorithm; mutation operator; selection operator; Analytical models; Computational modeling; Crawlers; Genetic algorithms; Genetics; Search engines; Web pages; Focused Crawler; Genetic algorithm; URL analysis model;
fLanguage
English
Publisher
ieee
Conference_Titel
Mechatronics and Automation (ICMA), 2011 International Conference on
Conference_Location
Beijing
ISSN
2152-7431
Print_ISBN
978-1-4244-8113-2
Type
conf
DOI
10.1109/ICMA.2011.5986315
Filename
5986315
Link To Document