Title :
Exploiting rich features for Chinese named entity recognition
Author :
Shen, Jianping ; Wang, Xuan ; Li, Shaofeng ; Yao, Lin
Author_Institution :
Comput. Applic. Res. Center, Harbin Inst. of Technol., Shenzhen, China
Abstract :
In this paper we design a multiple features template includes basic features, prefixes and suffixed features, dictionary features and combined features for Chinese named entity recognizer CRF model-based. We do a pre-processing procedure such as pos tag, chunk dictionary-based first. Then for dictionary features, different proportion of dictionaries are used in training and testing, which is different from the work reported in the literature, especially to person name dictionary, location name dictionary and organization name dictionary. For these three named entity dictionaries, the training dictionaries are just a part of the testing dictionaries. Empirical results show that the multiple features template is comprehensive and different proportion of some dictionaries used in training and testing improve performance significantly. Our final system achieved the F-measure of 91.27% at MSRA testing corpus, which is even better than the SIGHAN 2006 at the same testing corpus.
Keywords :
character recognition; dictionaries; CRF model; Chinese named entity recognition; dictionary feature; organization name dictionary; preprocessing procedure; Artificial neural networks; Dictionaries; Feature extraction; Humans; Organizations; Testing; Training; CRF; Feature; Named Entity Recognition; component;
Conference_Titel :
Intelligent Systems and Knowledge Engineering (ISKE), 2010 International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-6791-4
DOI :
10.1109/ISKE.2010.5680864