DocumentCode :
2860572
Title :
Efficient Wrapper Reinduction from Dynamic Web Sources
Author :
Mohapatra, Roshni ; Rajaraman, Kanagasabai ; Yuan, Sung Sam
Author_Institution :
Institute for Infocomm Research, Singapore
fYear :
2004
fDate :
20-24 Sept. 2004
Firstpage :
391
Lastpage :
397
Abstract :
This paper investigates wrapper induction from web sites whose layout may change over time. We formulate the reinduction as an incremental learning problem and identify that wrapper induction from an incomplete label is a key problem to be solved. We propose a novel algorithm for incrementally inducing LR wrappers and show that this algorithm asymptotically identifies the correct wrapper as the number of tuples is increased. This property is used to propose a LR wrapper reinduction algorithm. This algorithm requires examples to be provided exactly once and there-after the algorithm can detect the layout changes and reinduce wrappers automatically. In experimental studies, we observe that the reinduction algorithm is able to achieve near perfect performance.
Keywords :
Algorithm design and analysis; Change detection algorithms; Data mining; HTML; Lifting equipment; Performance analysis; USA Councils; Web pages; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
Type :
conf
DOI :
10.1109/WI.2004.10043
Filename :
1410831
Link To Document :
بازگشت