• DocumentCode
    2665344
  • Title

    Automatic Chinese unknown word extraction using small-corpus-based method

  • Author

    Chang, Tac-Hsing ; Lee, Chia-Hoang

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan
  • fYear
    2003
  • fDate
    26-29 Oct. 2003
  • Firstpage
    459
  • Lastpage
    464
  • Abstract
    Chinese unknown word extraction is an important problem for Chinese language processing. There are troublesome difficulties in the problem. First, almost any Chinese character can either represent a word or be a part of other words. Secondly, there is no blank between Chinese words for identifying the boundaries. Although some approaches have been proposed, there are some drawbacks in these methods. Here, we present and develop a method to extract Chinese unknown words more efficiently and precisely. It retains efficiency and accuracy even though the size of document set is small for training. It can also extract the unknown words occur rarely. Based on these advantages, it is very practical for real applications.
  • Keywords
    character recognition; linguistics; natural languages; word processing; Chinese language processing; Chinese unknown word extraction; corpus-based method; Data mining; Information science; Natural languages; Sun; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
  • Conference_Location
    Beijing, China
  • Print_ISBN
    0-7803-7902-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2003.1275950
  • Filename
    1275950