• DocumentCode
    694411
  • Title

    A topic-specific Web crawler based on content and structure mining

  • Author

    Rong Qian ; Kejun Zhang ; Geng Zhao

  • Author_Institution
    Dept. of Comput. Sci., Beijing Electron. Sci. & Technol. Inst., Beijing, China
  • fYear
    2013
  • fDate
    12-13 Oct. 2013
  • Firstpage
    458
  • Lastpage
    461
  • Abstract
    This paper discusses a topic-specific intelligent Web crawler based on Web content and structure mining. The method takes advantage of the characteristics of the neural network and introduces the reinforcement learning to find the relativity between the crawled web pages and the topic. When calculating the correlation, we just select the important tags of HTML makeup of the Web page, to analyze the web page´s content and structure. The experiments show that our method improves the efficiency and accuracy clearly.
  • Keywords
    Internet; data mining; hypermedia markup languages; learning (artificial intelligence); neural nets; HTML makeup; Web crawler; Web page content mining; Web page structure mining; neural network; reinforcement learning; Crawlers; Data mining; Learning (artificial intelligence); Neural networks; Search engines; Uniform resource locators; Web pages; Topic-specific; crawling algorithm; reinforcement learning; web content and structure mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
  • Conference_Location
    Dalian
  • Type

    conf

  • DOI
    10.1109/ICCSNT.2013.6967153
  • Filename
    6967153