DocumentCode
1909838
Title
Automatic Recognition of Chinese Organization Name Based on Conditional Random Fields
Author
Zhang, Suxiang ; Zhang, Suxian ; Wang, Xiaojie
Author_Institution
North China Electr. Power Univ., Baoding
fYear
2007
fDate
Aug. 30 2007-Sept. 1 2007
Firstpage
229
Lastpage
233
Abstract
Person, location and organization have been always mentioned as a bottleneck of a named entity recognition (NER) system. Automatic recognition of Chinese organization name is the most difficult problem in NER tasks. This paper presents a new approach of Chinese organization name recognition based on cascaded conditional random fields. In the proposed approach, we first recognize the person name and location name before recognizing organization. The model structure has been designed with the cascade way, the result then is passed to the high model and suppose the decision of high model for recognition of the complicated organization names. And we proposed the new feature to realize this task. We evaluate our approach on large-scale corpus with open test method using People´s Daily (January. 1998). Chinese ORG recalling rate achieves 88.78% and the precision rate is 82.35%. The evaluation results show that our approach based on cascaded conditional random fields significantly outperforms previous approaches.
Keywords
information retrieval; natural languages; random processes; text analysis; Chinese organization recognition; cascaded conditional random field; information extraction; large-scale corpus; named entity recognition system; question answering system; text document; Character recognition; Data mining; Educational institutions; Hidden Markov models; Large-scale systems; Machine learning; Power engineering and energy; Sun; Testing; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-1611-0
Electronic_ISBN
978-1-4244-1611-0
Type
conf
DOI
10.1109/NLPKE.2007.4368038
Filename
4368038
Link To Document