DocumentCode :
424082
Title :
A stepwise learning approach to automatic discovery of interest data blocks
Author :
Yang, Pei ; Zheng, Qi-Lun ; Peng, Hong ; Tan, Qi
Author_Institution :
Inst. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
Volume :
3
fYear :
2004
fDate :
26-29 Aug. 2004
Firstpage :
1441
Abstract :
The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. A key problem with the existing wrappers is that the wrapped rule learned from the examples is only adaptive for the specific Web site. We propose a novel approach, DBFinder, to discover interest data blocks from a set of Web pages. It is a key step in the data extraction. The process of DBFinder consists of two phases: semi-supervised wrapping and unsupervised wrapper. The goal of the first phase is to learn the wrapped rules for the specific Web site. The goal of the second phase is to popularize the wrapped rules for other Web sites in the same domain with the sample Web site. Two kinds of data mining techniques, frequent sub-tree mining and association rule mining, are used to accomplish such a goal. To demonstrate the feasibility of our approach, some detailed experiments are conducted. We have also applied our approach in a real application, which is a comparison-shopping agent.
Keywords :
Internet; Web sites; data mining; divide and conquer methods; hypermedia markup languages; information retrieval; learning by example; unsupervised learning; DBFinder method; Web pages; Web site; association rule mining; comparison shopping agent; data extraction; data mining technique; divide and conquer methods; frequent subtree mining; hypermedia markup languages; interest data block discovery; online information sources; semisupervised wrapping; stepwise learning method; unsupervised wrapper; wrapped rules; Application software; Association rules; Computer applications; Computer science; Data engineering; Data mining; Neutron spin echo; Taxonomy; Web pages; Wrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
Type :
conf
DOI :
10.1109/ICMLC.2004.1382000
Filename :
1382000
Link To Document :
بازگشت