• DocumentCode
    499021
  • Title

    A method of Part-Of-Speech guessing of Chinese Unknown Words based on combined features

  • Author

    Zhang, Hai-Jun ; Shi, Shu-min ; Feng, Chong ; Huang, He-yan

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
  • Volume
    1
  • fYear
    2009
  • fDate
    12-15 July 2009
  • Firstpage
    328
  • Lastpage
    332
  • Abstract
    Part-of-speech (POS) guessing of unknown words is an essential phase in the process of unknown words identification. This paper applies combined features (namely, both external and internal features) in POS guessing of Chinese unknown words, under conditional random field model (CRF). For acquiring high-precision of POS guessing, this paper puts forward a method of integrating Chinese radical, as a new internal feature of Chinese characters, into the existing feature set. Experiments show that the application of combined features is effective for POS guessing, and the new feature can significantly improve the performance of POS guessing (precision is up to 94.67%). The results also show that Chinese radical, as an effective internal feature in the field of lexical analysis, has a certain practical value.
  • Keywords
    graph theory; natural language processing; Chinese unknown word; Chinese word segmentation; conditional random field model; lexical analysis; natural language processing; part-of-speech guessing; unknown words identification; Computer science; Cybernetics; Dictionaries; Information processing; Machine learning; Natural language processing; Natural languages; Search engines; Tagging; Voting; CRF; Chinese word segmentation; POS guessing; Unknown words;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2009 International Conference on
  • Conference_Location
    Baoding
  • Print_ISBN
    978-1-4244-3702-3
  • Electronic_ISBN
    978-1-4244-3703-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2009.5212477
  • Filename
    5212477