DocumentCode
3105073
Title
A Heuristic Approach for Segmentation Granularity Problem in Chinese Information Retrieval
Author
Fan, Ding ; Bin, Wang ; Sili, Wang
fYear
2007
fDate
22-24 Aug. 2007
Firstpage
87
Lastpage
91
Abstract
In Chinese information retrieval, documents are usually segmented into words and then indexed by these words. However, segmentation granularity problem (SDP) should be considered because small granularity may lead to low precision and efficiency while big granularity may cause low recall. To solve the problem, this paper proposes an intuitive and heuristic approach. Two-level index for the segmentation dictionary is built by which the original query word could be expanded with its weighted overlaid words. This method not only reserves the advantage of big granularity in precision, but also overcome its disadvantage in recall. The experimental results show that our approach slightly but consistently outperforms the baseline.
Keywords
Computers; Dictionaries; Frequency; Indexing; Information retrieval; Information technology; Large-scale systems; Natural languages; Particle separators;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
Conference_Location
Luoyang, Henan, China
Print_ISBN
978-0-7695-2930-1
Type
conf
DOI
10.1109/ALPIT.2007.46
Filename
4460620
Link To Document