Title :
Word-level Chinese named entity recognition based on segmentation digraph
Author :
Gao, Hong ; Huang, Degen ; Yang, Yuansheng
Author_Institution :
Dept. of Comput. Sci. & Eng., Dalian Univ. of Technol., China
fDate :
30 Oct.-1 Nov. 2005
Abstract :
This paper presents a statistic method to recognize word-level Chinese named entities based on segmentation digraph. The main idea is to generate named entity (NE) candidates according to their internal characteristic, and those NE candidates with high confidence are added into the segmentation digraph of a Chinese string as vertices along with lexical word candidates. Bigram model and trigram model are used when ambiguities occur to evaluate each path of segmentation digraph. The shortest path is selected as the optimal segment of the Chinese string and NEs that are recognized just in it. The performance of our method was evaluated on the corpus of Peking University, and the results show the method is simple and effective.
Keywords :
computational linguistics; natural languages; statistical analysis; bigram model; segmentation digraph; statistic method; trigram model; word-level Chinese named entity recognition; Character generation; Computer science; Data mining; Hidden Markov models; Humans; Natural language processing; Natural languages; Paper technology; Performance evaluation; Statistics;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
DOI :
10.1109/NLPKE.2005.1598766