Title :
Chinese chunking based on frequently used words
Author :
Qi, Quan ; Liu, Li
Author_Institution :
Sch. of Comput., Beijing Inst. of Technol., Beijing, China
Abstract :
Chinese chunking is defined as a task to automatically segment Chinese sentences into small chunks which hold semantic meanings. To improve the performance of Chinese chunking, we propose an approach to use frequently used words (FUW) for Chinese chunking. We use conditional random fields for chunking, and modified the training corpus according to the frequency of the words in it. Finally we devise an experiment to evaluate how the number of FUW affects the performance of conditional random fields. The experiment shows that the approach can be helpful for Chinese chunking, but the count of FUW should be carefully selected for high performance and low training cost.
Keywords :
natural language processing; Chinese chunking; Chinese sentence segmentation; frequently used words; semantic meanings; Costs; Entropy; Frequency; Hidden Markov models; Learning systems; Machining; Natural language processing; Performance gain; Testing; Text recognition; Chinese chunking; Frequently used words; component; conditional random fields;
Conference_Titel :
Computer Engineering and Technology (ICCET), 2010 2nd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6347-3
DOI :
10.1109/ICCET.2010.5485492