DocumentCode :
3101599
Title :
Subword-based text retrieval
Author :
Hahn, Udo ; Honeck, Martin ; Schulz, Stefan
Author_Institution :
Text Knowledge Eng. Lab, Freiburg Univ., Germany
fYear :
2003
fDate :
6-9 Jan. 2003
Abstract :
Document retrieval in languages with a rich and complex morphology - particularly in terms of derivation and (single-word) composition - suffers from serious performance degradation with the stemming-only query-term-to-text-word-matching paradigm. We propose an alternative approach in which morphologically complex word forms are segmented into relevant subwords (such as stems, prefixes, suffixes), and subwords constitute the basic unit for indexing and retrieval. We evaluate our approach on a large biomedical document collection.
Keywords :
indexing; information retrieval; text analysis; word processing; biomedical document collection; document retrieval; indexing; stemming-only query-term-to-text-word-matching paradigm; subword-based text retrieval; Biomedical informatics; Blood; Degradation; Hospitals; Indexing; Information retrieval; Knowledge engineering; Morphology; Natural languages; Performance analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on
Print_ISBN :
0-7695-1874-5
Type :
conf
DOI :
10.1109/HICSS.2003.1174249
Filename :
1174249
Link To Document :
بازگشت