مرکز منطقه ای اطلاع رساني علوم و فناوري - Chinese chunking based on frequently used words

DocumentCode :

518039

Title :

Chinese chunking based on frequently used words

Author :

Qi, Quan ; Liu, Li

Author_Institution :

Sch. of Comput., Beijing Inst. of Technol., Beijing, China

Volume :

fYear :

2010

fDate :

16-18 April 2010

Abstract :

Chinese chunking is defined as a task to automatically segment Chinese sentences into small chunks which hold semantic meanings. To improve the performance of Chinese chunking, we propose an approach to use frequently used words (FUW) for Chinese chunking. We use conditional random fields for chunking, and modified the training corpus according to the frequency of the words in it. Finally we devise an experiment to evaluate how the number of FUW affects the performance of conditional random fields. The experiment shows that the approach can be helpful for Chinese chunking, but the count of FUW should be carefully selected for high performance and low training cost.

Keywords :

natural language processing; Chinese chunking; Chinese sentence segmentation; frequently used words; semantic meanings; Costs; Entropy; Frequency; Hidden Markov models; Learning systems; Machining; Natural language processing; Performance gain; Testing; Text recognition; Chinese chunking; Frequently used words; component; conditional random fields;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Engineering and Technology (ICCET), 2010 2nd International Conference on

Conference_Location :

Chengdu

Print_ISBN :

978-1-4244-6347-3

Type :

conf

DOI :

10.1109/ICCET.2010.5485492

Filename :

5485492

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=518039