DocumentCode :
2183747
Title :
Automatically generating labeled examples for Web wrapper maintenance
Author :
Raposo, Juan ; Pan, Alberto ; Alvarez, Manuel ; Hidalgo, Justo
Author_Institution :
A Coruna Univ., Spain
fYear :
2005
fDate :
19-22 Sept. 2005
Firstpage :
250
Lastpage :
256
Abstract :
In order to let software programs gain full benefit from semi-structured Web sources, wrapper programs must be built to provide a "machine readable" view over them. A significant problem of this approach is that, since Web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach, the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real world Web data extraction problems.
Keywords :
Internet; software maintenance; Web data extraction; Web source; Web wrapper maintenance; machine readable; Application software; Data mining; Databases; Heuristic algorithms; Induction generators;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2415-X
Type :
conf
DOI :
10.1109/WI.2005.40
Filename :
1517850
Link To Document :
بازگشت