Title :
How unsupervised learning affects character tagging based Chinese Word Segmentation: A quantitative investigation
Author :
Song, Yan ; Kit, Chunyu ; Xu, Ruifeng ; Zhao, Hai
Author_Institution :
Dept. of Chinese, Translation & Linguistics, City Univ. of Hong Kong, Kowloon, China
Abstract :
Integrating global information of unsupervised segmentation into Conditional Random Fields (CRF) learning has been proved effective to enhance the performance of the character tagging based Chinese Word Segmentation. By comparing CRF models with and without unsupervised learning enhancement, we investigate how unsupervised learning affects the performance. Especially, two kinds of segmented words, in-vocabulary and out-of-vocabulary words, are separately analyzed case by case to see what part of those words are affected by unsupervised learning. In addition, the cost of the additional features derived from unsupervised segmentation are also taken into account and evaluated.
Keywords :
learning (artificial intelligence); natural language processing; CRF model; Chinese word segmentation; character tagging; conditional random fields learning; global information; unsupervised learning enhancement; unsupervised segmentation; Cybernetics; Machine learning; Tagging; Unsupervised learning; Chinese word segmentation; Unsupervised learning; frequent substring extraction; in-vocabulary words; out-of-vocabulary words;
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
DOI :
10.1109/ICMLC.2009.5212769