Title : 
A topic-specific Web crawler based on content and structure mining
         
        
            Author : 
Rong Qian ; Kejun Zhang ; Geng Zhao
         
        
            Author_Institution : 
Dept. of Comput. Sci., Beijing Electron. Sci. & Technol. Inst., Beijing, China
         
        
        
        
        
        
            Abstract : 
This paper discusses a topic-specific intelligent Web crawler based on Web content and structure mining. The method takes advantage of the characteristics of the neural network and introduces the reinforcement learning to find the relativity between the crawled web pages and the topic. When calculating the correlation, we just select the important tags of HTML makeup of the Web page, to analyze the web page´s content and structure. The experiments show that our method improves the efficiency and accuracy clearly.
         
        
            Keywords : 
Internet; data mining; hypermedia markup languages; learning (artificial intelligence); neural nets; HTML makeup; Web crawler; Web page content mining; Web page structure mining; neural network; reinforcement learning; Crawlers; Data mining; Learning (artificial intelligence); Neural networks; Search engines; Uniform resource locators; Web pages; Topic-specific; crawling algorithm; reinforcement learning; web content and structure mining;
         
        
        
        
            Conference_Titel : 
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
         
        
            Conference_Location : 
Dalian
         
        
        
            DOI : 
10.1109/ICCSNT.2013.6967153