DocumentCode
607286
Title
Chinese personal name recognition using N-gram model and rules
Author
Chen Lin ; Zhang Hui ; Li Zhen´an
Author_Institution
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
fYear
2012
fDate
3-5 Dec. 2012
Firstpage
450
Lastpage
453
Abstract
Chinese personal name recognition plays an important role in Chinese word segmentation and it´s difficult to recognize whether a sequence of characters is a name or not for its complexity. This paper presents a new algorithm based on N-gram model and recognition rules to resolve this problem. In order to increase efficiency and accuracy, we also build several dictionaries such as a surname dictionary and a person-name dictionary. Experiments on different corpora show that the improved tokenizer using this algorithm performs stably and achieves more than 10 percent word segmentation accuracy increase than the original one. Averagely the improved tokenizer´s recall rate and accuracy rate are both over 92%.
Keywords
natural language processing; pattern recognition; text analysis; word processing; Chinese personal name recognition; Chinese word segmentation; N-gram model; characters sequence; person-name dictionary; recognition rules; surname dictionary; tokenizer; Chinese personal name recognition; N-gram model; recognition rules;
fLanguage
English
Publisher
ieee
Conference_Titel
Computing and Convergence Technology (ICCCT), 2012 7th International Conference on
Conference_Location
Seoul
Print_ISBN
978-1-4673-0894-6
Type
conf
Filename
6530375
Link To Document