Title :
Web Object Block Mining Based on Tag Similarity
Author :
Liu, Rui ; Xiong, Rui ; Gao, Kun
Author_Institution :
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
Abstract :
Currently, a large number of Web information on the Internet is presented in structured objects. Mining object information from Web is of great importance for Web data management. This paper presents a Web object block mining method based on tag similarity. It first constructs a DOM tree for the Web page and calculates the similarity of all possible generalized nodes. Then a pruning method is used to filter the redundant information based on the features of noise data and find the Web object region. Finally the Web objects are identified in the Web object region. The experiment results show that, comparing to IEPAD, our method got a higher precision.
Keywords :
Internet; data mining; information retrieval; trees (mathematics); DOM tree; Internet; Web data management; Web information; Web object block mining method; pruning method; tag similarity; Automation; Data mining; HTML; Information filtering; Information filters; Intelligent structures; Internet; Machine learning algorithms; Programming; Web pages; DOM tree; Generalized Node; Information Extraction; Tag Similarity; Web Object Region;
Conference_Titel :
Intelligent Computation Technology and Automation (ICICTA), 2010 International Conference on
Conference_Location :
Changsha
Print_ISBN :
978-1-4244-7279-6
Electronic_ISBN :
978-1-4244-7280-2
DOI :
10.1109/ICICTA.2010.684