• DocumentCode
    2784159
  • Title

    Focused crawler URL analysis model based on improved genetic algorithm

  • Author

    Ning, Hui ; Wu, Hao ; He, Zhongzheng ; Tan, Yazhou

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
  • fYear
    2011
  • fDate
    7-10 Aug. 2011
  • Firstpage
    2159
  • Lastpage
    2164
  • Abstract
    This paper analyses the URL analysis models of the existing focused crawler, and also their pros and cons, then we propose a URL analysis model based on the improved genetic algorithm, in which the selection operator, crossover operator and mutation operator are optimized. The user query is introduced to construct the virtual documents to participate the genetic process. The Rocchio feedback learning algorithm is used to amend the theme vector, and also to compute the relevant degree of the themes for the anchor text. The experiment shows that the improved generic algorithm can effectively collect the topic page.
  • Keywords
    Internet; genetic algorithms; learning (artificial intelligence); mathematical operators; Rocchio feedback learning algorithm; URL analysis model; crossover operator; focused crawler; genetic algorithm; mutation operator; selection operator; Analytical models; Computational modeling; Crawlers; Genetic algorithms; Genetics; Search engines; Web pages; Focused Crawler; Genetic algorithm; URL analysis model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mechatronics and Automation (ICMA), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    2152-7431
  • Print_ISBN
    978-1-4244-8113-2
  • Type

    conf

  • DOI
    10.1109/ICMA.2011.5986315
  • Filename
    5986315