Author_Institution :
Dept. Appl. Comput. Sci., Univ. of Winnipeg, Winnipeg, MB, Canada
Abstract :
In this paper, we discuss the architecture of a system, the so-called Web and Document Databases (WDDBS for short), designed to explore the Internet effectively and efficiently. Abstractly, a WDDBS can be defined as a triple <;D, P, W>, where (1) D stands for a local document database to store XML documents, (2) P for a subsystem responsible for remote query evaluation, including resolution of semantic conflicts among heterogeneous databases, and (3) W for a Web crawler which should be able to find information sources related to the local database in some way. Then, each information source can be organized into a WDDB distributed over the Internet, which may be connected to others through URLs. A query submitted to a WDDBS will first be evaluated against the local document database, and then possibly switched over to some remote document databases if necessary, which is controlled by the ´knowledge´ on how local WDDBSs are connected. In this way, the load of traffic over the Internet can effectively be decreased, but the information explored is more relevant.
Keywords :
Internet; XML; database management systems; query processing; Internet; WDDBS; Web and document databases; Web crawler; XML documents; heterogeneous databases; local document database; remote query evaluation; system architecture; Books; Crawlers; Internet; Ontologies; Query processing; XML; Web; XML document; hash tabels; semantic conflict resolution; signature trees; tree pattern queries;