Title :
A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features
Author :
Zhang, Hai-Jun ; Shi, Shu-min ; Feng, Chong ; Huang, He-yan
Author_Institution :
Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
Part-of-speech (POS) guessing of unknown words is an essential phase in the process of unknown words identification. This paper applies combined features (namely, both external and internal features) in POS guessing of Chinese unknown words, under conditional random field model (CRF). For acquiring high-precision of POS guessing, this paper puts forward a method of integrating Chinese radical, as a new internal feature of Chinese characters, into the existing feature set. Experiments show that the application of combined features is effective for POS guessing, and the new feature can significantly improve the performance of POS guessing (precision is up to 94.67%). The results also show that Chinese radical, as an effective internal feature in the field of lexical analysis, has a certain practical value.
Keywords :
graph theory; natural language processing; Chinese unknown word; Chinese word segmentation; conditional random field model; lexical analysis; natural language processing; part-of-speech guessing; unknown words identification; Computer science; Cybernetics; Dictionaries; Information processing; Machine learning; Natural language processing; Natural languages; Search engines; Tagging; Voting; CRF; Chinese word segmentation; POS guessing; Unknown words;
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
DOI :
10.1109/ICMLC.2009.5212477