DocumentCode :
2186566
Title :
ITPilot: a toolkit for industrial-strength Web data extraction
Author :
Pan, Alberto ; Raposo, Juan ; Álvarez, Manuel ; Montoto, Paula ; Losada, José ; Hidalgo, Justo
Author_Institution :
A Coruna Univ., Spain
fYear :
2005
fDate :
19-22 Sept. 2005
Firstpage :
798
Lastpage :
801
Abstract :
In recent years, many research systems have been proposed to perform data extraction and automation tasks on Web sources. Since most of today\´s Web sources are "human-readable" but not "machine-readable", these systems must address a number of difficult challenges, such as dealing with complex navigation sequences, extracting data from HTML pages and reacting to source changes. Denodo Corporation has developed ITPilot, an industrial-strength solution that allows complex "wrappers" for Web sources to be graphically generated and automatically maintained. This paper presents the architecture and the basic ideas "behind the scenes" in ITPilot.
Keywords :
Web sites; hypermedia markup languages; information retrieval; HTML page; ITPilot; Web data extraction; Web sources; Web wrapper; industrial-strength solution; Automation; Books; Computer architecture; Computer languages; Data mining; HTML; Java; Navigation; Web services; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2415-X
Type :
conf
DOI :
10.1109/WI.2005.85
Filename :
1517958
Link To Document :
بازگشت