DocumentCode :
3450700
Title :
An automatic approach to extracting review link from Chinese news pages
Author :
Wei Liu
Author_Institution :
Inf. Source Center, Inst. of Sci. & Tech. Inf. of China, Beijing, China
Volume :
2
fYear :
2011
fDate :
20-22 Aug. 2011
Firstpage :
411
Lastpage :
415
Abstract :
Review links are widely used in some special kinds of web pages, especially news pages. They are very useful pieces of information in many applications, such as hot topic discovery and public opinion monitoring. Unfortunately, extracting review links manually from news pages is time-consuming and error-prone. Though lots of works on web data extraction have been developed, we argue that this is still not a trivial problem due to the diversity on both DOM tree structure and visual presentation. In this paper, a novel approach is proposed for automatically extracting the review links from web pages. This approach consists of two steps: first segment each news page into a set of blocks, and then identify the block(s) that contain the review link using a machine learning technique. Experimental results over a large number of Chinese news pages indicate that this approach is highly accurate.
Keywords :
Internet; information retrieval; learning (artificial intelligence); reviews; tree data structures; Chinese news pages; DOM tree structure; Web data extraction; Web pages; machine learning technique; review link extraction; visual presentation; Data mining; Decision trees; Feature extraction; HTML; Training; Visualization; Web pages; Machine learning; Review link; Visual feature; Web data extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology and Artificial Intelligence Conference (ITAIC), 2011 6th IEEE Joint International
Conference_Location :
Chongqing
Print_ISBN :
978-1-4244-8622-9
Type :
conf
DOI :
10.1109/ITAIC.2011.6030361
Filename :
6030361
Link To Document :
بازگشت