DocumentCode :
351040
Title :
An integrated architecture for exploring, wrapping, mediating and restructuring information from the Web
Author :
May, Wolfgang
Author_Institution :
Inst. fur Inf., Freiburg Univ., Germany
fYear :
2000
fDate :
2000
Firstpage :
82
Lastpage :
89
Abstract :
The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources. A main problem with current wrapper/mediator approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an “impedance mismatch” between the wrapper and mediator level. Additionally, most approaches currently are tailored to access information from a fixed set of sources. In this paper we discuss an architecture where Web exploration, wrapping, mediation, and querying is done in an integrated system. Such an architecture reveals significant advantages in combination with a unified framework-i.e., data model and language-in which all tasks are done. Our approach is based on a a unified model of the application-level information and the relevant fragment of the Web, and on an integrated language for accessing the Web, wrapping, mediating, and querying information. In this world model, in contrast to other approaches, the relevant part of the Web becomes a part of the internal world model of the system. This allows for a data-driven Web exploration which is independent from a given network of individual predefined wrappers and mediators. Thus, in addition to the classical wrapping and mediating functionality, a system in this architecture can be equipped with Web navigation and exploration functionality. In an abstract sense, the system comprises a universal wrapper which can be applied to arbitrary Web data sources which become known to the system during information processing. Equipped with suitably intelligent rules, the system can potentially explore before unknown parts of the Web, thus coping with the steady growth of the Web. The architecture is implemented in the FLORID system
Keywords :
information resources; information retrieval; query processing; Web; Web data sources; information extraction; integrated architecture; mediator; wrapper; Data mining; Impedance; Mediation; Navigation; Read only memory; Service oriented architecture; Tail; Wrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Conference, 2000. ADC 2000. Proceedings. 11th Australasian
Conference_Location :
Canberra, ACT
Print_ISBN :
0-7695-0528-7
Type :
conf
DOI :
10.1109/ADC.2000.819817
Filename :
819817
Link To Document :
بازگشت