DocumentCode :
3102237
Title :
How unsupervised learning affects character tagging based Chinese Word Segmentation: A quantitative investigation
Author :
Song, Yan ; Kit, Chunyu ; Xu, Ruifeng ; Zhao, Hai
Author_Institution :
Dept. of Chinese, Translation & Linguistics, City Univ. of Hong Kong, Kowloon, China
Volume :
6
fYear :
2009
fDate :
12-15 July 2009
Firstpage :
3481
Lastpage :
3486
Abstract :
Integrating global information of unsupervised segmentation into Conditional Random Fields (CRF) learning has been proved effective to enhance the performance of the character tagging based Chinese Word Segmentation. By comparing CRF models with and without unsupervised learning enhancement, we investigate how unsupervised learning affects the performance. Especially, two kinds of segmented words, in-vocabulary and out-of-vocabulary words, are separately analyzed case by case to see what part of those words are affected by unsupervised learning. In addition, the cost of the additional features derived from unsupervised segmentation are also taken into account and evaluated.
Keywords :
learning (artificial intelligence); natural language processing; CRF model; Chinese word segmentation; character tagging; conditional random fields learning; global information; unsupervised learning enhancement; unsupervised segmentation; Cybernetics; Machine learning; Tagging; Unsupervised learning; Chinese word segmentation; Unsupervised learning; frequent substring extraction; in-vocabulary words; out-of-vocabulary words;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
Type :
conf
DOI :
10.1109/ICMLC.2009.5212769
Filename :
5212769
Link To Document :
بازگشت