DocumentCode
694411
Title
A topic-specific Web crawler based on content and structure mining
Author
Rong Qian ; Kejun Zhang ; Geng Zhao
Author_Institution
Dept. of Comput. Sci., Beijing Electron. Sci. & Technol. Inst., Beijing, China
fYear
2013
fDate
12-13 Oct. 2013
Firstpage
458
Lastpage
461
Abstract
This paper discusses a topic-specific intelligent Web crawler based on Web content and structure mining. The method takes advantage of the characteristics of the neural network and introduces the reinforcement learning to find the relativity between the crawled web pages and the topic. When calculating the correlation, we just select the important tags of HTML makeup of the Web page, to analyze the web page´s content and structure. The experiments show that our method improves the efficiency and accuracy clearly.
Keywords
Internet; data mining; hypermedia markup languages; learning (artificial intelligence); neural nets; HTML makeup; Web crawler; Web page content mining; Web page structure mining; neural network; reinforcement learning; Crawlers; Data mining; Learning (artificial intelligence); Neural networks; Search engines; Uniform resource locators; Web pages; Topic-specific; crawling algorithm; reinforcement learning; web content and structure mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
Conference_Location
Dalian
Type
conf
DOI
10.1109/ICCSNT.2013.6967153
Filename
6967153
Link To Document