Title :
Subword-based text retrieval
Author :
Hahn, Udo ; Honeck, Martin ; Schulz, Stefan
Author_Institution :
Text Knowledge Eng. Lab, Freiburg Univ., Germany
Abstract :
Document retrieval in languages with a rich and complex morphology - particularly in terms of derivation and (single-word) composition - suffers from serious performance degradation with the stemming-only query-term-to-text-word-matching paradigm. We propose an alternative approach in which morphologically complex word forms are segmented into relevant subwords (such as stems, prefixes, suffixes), and subwords constitute the basic unit for indexing and retrieval. We evaluate our approach on a large biomedical document collection.
Keywords :
indexing; information retrieval; text analysis; word processing; biomedical document collection; document retrieval; indexing; stemming-only query-term-to-text-word-matching paradigm; subword-based text retrieval; Biomedical informatics; Blood; Degradation; Hospitals; Indexing; Information retrieval; Knowledge engineering; Morphology; Natural languages; Performance analysis;
Conference_Titel :
System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on
Print_ISBN :
0-7695-1874-5
DOI :
10.1109/HICSS.2003.1174249