DocumentCode
499021
Title
A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features
Author
Zhang, Hai-Jun ; Shi, Shu-min ; Feng, Chong ; Huang, He-yan
Author_Institution
Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
Volume
1
fYear
2009
fDate
12-15 July 2009
Firstpage
328
Lastpage
332
Abstract
Part-of-speech (POS) guessing of unknown words is an essential phase in the process of unknown words identification. This paper applies combined features (namely, both external and internal features) in POS guessing of Chinese unknown words, under conditional random field model (CRF). For acquiring high-precision of POS guessing, this paper puts forward a method of integrating Chinese radical, as a new internal feature of Chinese characters, into the existing feature set. Experiments show that the application of combined features is effective for POS guessing, and the new feature can significantly improve the performance of POS guessing (precision is up to 94.67%). The results also show that Chinese radical, as an effective internal feature in the field of lexical analysis, has a certain practical value.
Keywords
graph theory; natural language processing; Chinese unknown word; Chinese word segmentation; conditional random field model; lexical analysis; natural language processing; part-of-speech guessing; unknown words identification; Computer science; Cybernetics; Dictionaries; Information processing; Machine learning; Natural language processing; Natural languages; Search engines; Tagging; Voting; CRF; Chinese word segmentation; POS guessing; Unknown words;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location
Baoding
Print_ISBN
978-1-4244-3702-3
Electronic_ISBN
978-1-4244-3703-0
Type
conf
DOI
10.1109/ICMLC.2009.5212477
Filename
5212477
Link To Document