DocumentCode :
2665766
Title :
Integrated Chinese word segmentation and part-of-speech tagging based on the divide-and-conquer strategy
Author :
Maosong, Sun ; Dongliang, Xu ; Tsou, Benjamin K.
Author_Institution :
Dept. of Comput. Sci., Tsinghua Univ., Beijing, China
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
610
Lastpage :
615
Abstract :
In this paper, various ways of integration of Chinese word segmentation and part-of-speech tagging, including the so-called true-integration and pseudo-integration, are tested and compared based on a test corpus consisting of 367,114 Chinese characters. A novel true-integration approach, named ´the divide-and-conquer integration´, is originally proposed. Preliminary experiments show that this true integration achieves 98.72% accuracy of word segmentation, 95.65% accuracy of part-of-speech tagging, and 94.43% accuracy of word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent (though not very significant). The results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.
Keywords :
linguistics; natural languages; speech recognition; Chinese characters; Chinese word segmentation; divide-and-conquer integration; part-of-speech tagging; Computer science; Frequency; Hidden Markov models; Intelligent systems; Natural languages; Sun; System testing; Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275978
Filename :
1275978
Link To Document :
بازگشت