Title :
Chinese organization name recognition based on co-training algorithm
Author :
Ke Xiao ; Li Shaozi
Author_Institution :
Dept. of Cognitive Sci., Xiamen Univ., Xiamen, China
Abstract :
Organization name recognition is the most difficult part in named entity recognition, in order to reduce the use of tagged corpus and use a large amount of untagged corpus, we firstly present using semi-supervised machine learning algorithm co-training combining with conditional random fields model and support vector machines on Chinese organization name recognition. Based on the principles of compatible and uncorrelated, we construct different classifiers from different views of conditional random fields model, and also construct different classifiers from two models of conditional random fields model and support vector machines as two views. Then present a heuristic untagged samples selection algorithm. From the experimental results we can see that, under the same F-measure, co-training algorithm simply use about 30% of the tagged data compared to single statistical model; under the same tagged data, co-training algorithm has an F-measure increase about 10% than single statistical model.
Keywords :
character recognition; statistical analysis; support vector machines; Chinese organization name recognition; co-training algorithm; conditional random fields model; named entity recognition; semi-supervised machine learning algorithm; statistical model; support vector machines; tagged corpus; untagged corpus; Character recognition; Cognitive science; Hidden Markov models; Intelligent systems; Knowledge engineering; Learning systems; Machine learning; Support vector machine classification; Support vector machines; Tagging;
Conference_Titel :
Intelligent System and Knowledge Engineering, 2008. ISKE 2008. 3rd International Conference on
Conference_Location :
Xiamen
Print_ISBN :
978-1-4244-2196-1
Electronic_ISBN :
978-1-4244-2197-8
DOI :
10.1109/ISKE.2008.4731034