Title :
Looking at the Web through XML glasses
Author :
Sahuguet, Arnaud ; Azavant, Fabien
Author_Institution :
Dept. of Comput. & Inf. Sci., Pennsylvania Univ., Philadelphia, PA, USA
Abstract :
The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and make information accessible to applications, in order to offer automation, inter-operation and Web-awareness among services. To do so, information from Web sources needs to be accessible in a structured way. XML and its various extensions (data-models, query languages) are a step in this direction. Unfortunately, the Web is not yet a well organized repository of nicely structured XML documents but rather a conglomerate of volatile HTML pages, for which structure has to be extracted. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a Java toolkit for the generation of wrappers for Web sources. Our main contributions are: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to XML documents, with the automatic generation of the corresponding DTDs; (3) some visual supports to make the engineering of wrappers faster and easier As an illustration, we show how we can, via W4F intermediation, transparently query HTML sources from an XML query language
Keywords :
hypermedia markup languages; information resources; query languages; HTML pages; Java toolkit; Web sources; World Wide Web Wrapper Factory; XML; data-models; declarative mapping; expressive language; query languages; Automation; Database languages; Glass; HTML; Humans; Information science; Motion pictures; Production facilities; Read only memory; XML;
Conference_Titel :
Cooperative Information Systems, 1999. CoopIS '99. Proceedings. 1999 IFCIS International Conference on
Conference_Location :
Edinburgh
Print_ISBN :
0-7695-0384-5
DOI :
10.1109/COOPIS.1999.792166