Title :
Integrated Chinese word segmentation and part-of-speech tagging based on the divide-and-conquer strategy
Author :
Maosong, Sun ; Dongliang, Xu ; Tsou, Benjamin K.
Author_Institution :
Dept. of Comput. Sci., Tsinghua Univ., Beijing, China
Abstract :
In this paper, various ways of integration of Chinese word segmentation and part-of-speech tagging, including the so-called true-integration and pseudo-integration, are tested and compared based on a test corpus consisting of 367,114 Chinese characters. A novel true-integration approach, named ´the divide-and-conquer integration´, is originally proposed. Preliminary experiments show that this true integration achieves 98.72% accuracy of word segmentation, 95.65% accuracy of part-of-speech tagging, and 94.43% accuracy of word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent (though not very significant). The results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.
Keywords :
linguistics; natural languages; speech recognition; Chinese characters; Chinese word segmentation; divide-and-conquer integration; part-of-speech tagging; Computer science; Frequency; Hidden Markov models; Intelligent systems; Natural languages; Sun; System testing; Tagging;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275978