• DocumentCode
    607286
  • Title

    Chinese personal name recognition using N-gram model and rules

  • Author

    Chen Lin ; Zhang Hui ; Li Zhen´an

  • Author_Institution
    State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
  • fYear
    2012
  • fDate
    3-5 Dec. 2012
  • Firstpage
    450
  • Lastpage
    453
  • Abstract
    Chinese personal name recognition plays an important role in Chinese word segmentation and it´s difficult to recognize whether a sequence of characters is a name or not for its complexity. This paper presents a new algorithm based on N-gram model and recognition rules to resolve this problem. In order to increase efficiency and accuracy, we also build several dictionaries such as a surname dictionary and a person-name dictionary. Experiments on different corpora show that the improved tokenizer using this algorithm performs stably and achieves more than 10 percent word segmentation accuracy increase than the original one. Averagely the improved tokenizer´s recall rate and accuracy rate are both over 92%.
  • Keywords
    natural language processing; pattern recognition; text analysis; word processing; Chinese personal name recognition; Chinese word segmentation; N-gram model; characters sequence; person-name dictionary; recognition rules; surname dictionary; tokenizer; Chinese personal name recognition; N-gram model; recognition rules;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computing and Convergence Technology (ICCCT), 2012 7th International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-0894-6
  • Type

    conf

  • Filename
    6530375