DocumentCode :
499021
Title :
A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features
Author :
Zhang, Hai-Jun ; Shi, Shu-min ; Feng, Chong ; Huang, He-yan
Author_Institution :
Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
Volume :
1
fYear :
2009
fDate :
12-15 July 2009
Firstpage :
328
Lastpage :
332
Abstract :
Part-of-speech (POS) guessing of unknown words is an essential phase in the process of unknown words identification. This paper applies combined features (namely, both external and internal features) in POS guessing of Chinese unknown words, under conditional random field model (CRF). For acquiring high-precision of POS guessing, this paper puts forward a method of integrating Chinese radical, as a new internal feature of Chinese characters, into the existing feature set. Experiments show that the application of combined features is effective for POS guessing, and the new feature can significantly improve the performance of POS guessing (precision is up to 94.67%). The results also show that Chinese radical, as an effective internal feature in the field of lexical analysis, has a certain practical value.
Keywords :
graph theory; natural language processing; Chinese unknown word; Chinese word segmentation; conditional random field model; lexical analysis; natural language processing; part-of-speech guessing; unknown words identification; Computer science; Cybernetics; Dictionaries; Information processing; Machine learning; Natural language processing; Natural languages; Search engines; Tagging; Voting; CRF; Chinese word segmentation; POS guessing; Unknown words;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
Type :
conf
DOI :
10.1109/ICMLC.2009.5212477
Filename :
5212477
Link To Document :
بازگشت