DocumentCode :
2087505
Title :
Searching and browsing collections of structural information
Author :
Wolff, Jens E. ; Florke, Holger ; Cremers, Armin B.
Author_Institution :
Inst. fur Inf. III, Bonn Univ., Germany
fYear :
2000
fDate :
2000
Firstpage :
141
Lastpage :
150
Abstract :
This paper proposes a new approach to querying collections of structured textual information such as SGML/XML documents. Knowledge about the structure of documents is an additional resource that should be exploited during retrieval since the semantics of the different textual objects can be used to specify an information need much more precisely. However the traditional probabilistic retrieval model lacks the ability to handle structural information. We define a new retrieval function based on the probabilistic model which overcomes this drawback. The presented query language allows the assignment of structural roles to individual terms. The efficient evaluation of queries in this framework requires appropriate index structures. We design text and structure indexes and show how their information is combined during evaluation. The implementation supports additional functionalities such as a table of contents for browsing. First evaluation results show the feasibility of the approach on collections of unstructured documents
Keywords :
Internet; hypermedia markup languages; information needs; information resources; information retrieval; page description languages; query languages; Internet; SGML; XML; browsing; information needs; information retrieval; probabilistic retrieval model; query language; searching; structural information collections; unstructured documents; Cost accounting; Database languages; Information retrieval; Internet; Markup languages; Publishing; SGML; Usability; World Wide Web; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Digital Libraries, 2000. Proceedings. IEEE
Conference_Location :
Washington, DC
Print_ISBN :
0-7695-0659-3
Type :
conf
DOI :
10.1109/ADL.2000.848377
Filename :
848377
Link To Document :
بازگشت