Title :
Chinese POS tagging based on maximum entropy model
Author :
Zhao, Jian ; Wang, Xiao-long
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., China
Abstract :
The POS (part of speech) tagging is the basic work in natural language processing. The tagging precision will have an important effect on the result of latter process, such as syntax analysis. In this paper, a Chinese POS tagger based on the maximum entropy model is presented, which trains from a large corpus annotated with Chinese POS tags and assigns the best tag sequence to the Chinese sentence to be annotated. In this model, all the features that are useful to predicate the POS tags are mined to make the model closer to the real case. In addition, for the problem of overfitting, a smoothing method and a POS dictionary are maintained to reduce the model´s dependence to training data and improve the efficiency of the search process. Open test results shows that the Chinese POS tagging with this method can achieve an accuracy of 96.8%.
Keywords :
feature extraction; grammars; maximum entropy methods; natural languages; smoothing methods; Chinese language; Chinese sentence; dictionary; features selection; maximum entropy model; natural language processing; part of speech tagging; smoothing model; tag sequence; Computer science; Data mining; Entropy; Hidden Markov models; Natural language processing; Probability distribution; Smoothing methods; Speech; Tagging; Training data;
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
DOI :
10.1109/ICMLC.2002.1174406