DocumentCode :
3290649
Title :
Research on Hybrid Index for Chinese IR
Author :
Chen, Chen ; Li, Sheng ; Qi, Haoliang ; Yang, Muyun ; Zhao, Tiejun
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin
Volume :
4
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
606
Lastpage :
610
Abstract :
It is essential to identify terms that are used as index units in the processing of Chinese documents and queries in IR. In this paper new kinds of hybrid index are proposed, which combine words and bigrams. This kind of hybrid index can reduce the impact of out-of-vocabulary and segmentation ambiguity for Chinese IR, because the dictionary is applied to detect segmentation ambiguities in a flexible way rather than by the ambiguity table rigidly. The experiments show the new kind of hybrid index is not only comparable with bigrams indexing, but also enhances the retrieval efficiency.
Keywords :
dictionaries; document handling; indexing; information retrieval; natural language processing; vocabulary; Chinese IR; Chinese documents; bigrams indexing; dictionary; hybrid index; out-of-vocabulary; retrieval efficiency; segmentation ambiguity; Computer science; Dictionaries; Fuzzy systems; Indexing; Information processing; Infrared detectors; Merging; Natural languages; Chinese information retrieval; Hybrid Index;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Jinan Shandong
Print_ISBN :
978-0-7695-3305-6
Type :
conf
DOI :
10.1109/FSKD.2008.146
Filename :
4666456
Link To Document :
بازگشت