Title :
An indexing model for structured documents to support queries on content, structure and attributes
Author_Institution :
Dept. of Comput. Sci., R. Melbourne Inst. of Technol., Vic., Australia
Abstract :
The complex internal structure of documents can be described and captured by documentation representation standards such as SGML and SGML related standards like HTML and XML. The hierarchical structure of documents and the attributes of documents as well as attributes of document components at all levels of the document hierarchy can be encoded with markup tags. In traditional text database systems, only queries on content are supported. The rich structural information contained in documents and the attributes of document components are not captured in these systems, and queries on structure and attributes are not supported. We propose a text model, a query language and an indexing scheme which can support queries on content, structure, and attributes of documents as well as attributes of text elements within documents. This model is schema-independent, and query evaluation time is at worst linear. We show that our indexing scheme can efficiently support a wide range of queries in a database for highly heterogeneous collections of structured documents. We provide query examples to show how all the information encoded in documents marked up according to the TEI Guidelines, an encoding standard adopted by the humanities disciplines, can be indexed and queried in our indexing model
Keywords :
document handling; full-text databases; indexing; page description languages; query languages; query processing; HTML; SGML; TEI Guidelines; XML; attribute queries; content queries; documentation representation standards; encoding standard; humanities; indexing model; markup tags; query evaluation time; query language; schema-independent model; structure queries; structured documents; text database; text model; Database languages; Database systems; Documentation; Encoding; Guidelines; HTML; Indexing; Query processing; SGML; XML;
Conference_Titel :
Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on
Conference_Location :
Santa Barbara, CA
Print_ISBN :
0-8186-8464-X
DOI :
10.1109/ADL.1998.670383