DocumentCode :
2557748
Title :
Rhetorical structure theory for content-based indexing and retrieval of Web documents
Author :
Marir, Farhi ; Haouam, Kamel
Author_Institution :
Sch. of Informatics & Multimedia Technol., North London Univ., UK
fYear :
2004
fDate :
28 June-1 July 2004
Firstpage :
160
Lastpage :
164
Abstract :
The amount of information available on the Internet is currently growing at an incredible rate. However, the lack of efficient indexing is still a major barrier to effective information retrieval on the Web. This paper presents the design of a technique for content-based indexing and retrieval of relevant documents from a large collection of documents such as the Internet. The technique aims at improving the quality of retrieval by capturing the semantics of the documents. It introduces a thematic relationship between parts of text using a linguistics theory called rhetorical structure theory (RST) based on cue phrases to determine the set of rhetorical relations. Once these structures are determined, they can be saved into a database. We can then query that collection using not only keywords, as traditional information retrieval systems, but also rhetorical relations. The indexing and retrieval technique described in this paper is under development and initial results on a small number of documents have been very successful.
Keywords :
Internet; computational linguistics; content-based retrieval; document handling; indexing; information retrieval systems; linguistics; natural languages; Internet; Web documents; World Wide Web; content-based indexing; content-based retrieval; cue phrases; document indexing; document semantics; information retrieval systems; linguistics theory; rhetorical relations; rhetorical structure theory; thematic relationship; Content based retrieval; Databases; Frequency; Indexing; Informatics; Information retrieval; Internet; Maintenance; Natural language processing; Search engines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: Research and Education, 2004. ITRE 2004. 2nd International Conference on
Print_ISBN :
0-7803-8625-6
Type :
conf
DOI :
10.1109/ITRE.2004.1393667
Filename :
1393667
Link To Document :
بازگشت