Title :
Research of segmentation of Chinese texts in Chinese search engine
Author_Institution :
Inst. of Comput. Technol., Acad. Sinica, Beijing, China
Abstract :
Segmenting Chinese texts into Chinese words is a very difficult problem. In this paper, a framework for a Chinese Internet search engine is presented. It discusses the characteristics and difficulties of segmentation of Chinese texts in Chinese search engines. The paper concludes that the correctness of Chinese segmentation is most important, and puts forward tactics for processing disambiguation of segmentation strings, new unknown words and stop words, and presents methods which satisfy the consistency of Chinese segmentation
Keywords :
Internet; search engines; text analysis; Chinese internet search engine; Chinese text segmentation; Chinese words; new unknown words; processing disambiguation; segmentation strings; stop words; Computers; Content based retrieval; Context modeling; Dictionaries; Indexing; Information retrieval; Internet; Natural languages; Search engines; Sorting;
Conference_Titel :
Systems, Man, and Cybernetics, 2001 IEEE International Conference on
Conference_Location :
Tucson, AZ
Print_ISBN :
0-7803-7087-2
DOI :
10.1109/ICSMC.2001.972960