Title :
Semi-supervised learning for named entity recognition using weakly labeled training data
Author :
Zafarian, Atefeh ; Rokni, Ali ; Khadivi, Shahram ; Ghiasifard, Sonia
Author_Institution :
Dept. of Comput. Eng. & IT, Amirkabir Univ. of Technol., Tehran, Iran
Abstract :
The shortage of the annotated training data is still an important challenge to building many Natural Language Process (NLP) tasks such as Named Entity Recognition. NER requires a large amount of training data with a high degree of human supervision whereas there is not enough labeled data for every language. In this paper, we use an unlabeled bilingual corpora to extract useful features from transferring information from resource-rich language toward resource-poor language and by using these features and a small training data, make a NER supervised model. Then we utilize a graph-based semi-supervised learning method that trains a CRF-based supervised classifier using that labeled data and uses high-confidence predictions on the unlabeled data to expand the training set and improve efficiency of NER model with the new training set.
Keywords :
feature extraction; graph theory; learning (artificial intelligence); natural language processing; pattern classification; CRF-based supervised classifier; NER supervised model; NLP; annotated training data; feature extraction; graph-based semisupervised learning method; named entity recognition; natural language processing; unlabeled bilingual corpora; weakly labeled training data; Computational modeling; Data models; Feature extraction; Organizations; Semisupervised learning; Training; Training data; Bilingual parallel corpora; Named entity Recognition; graph-based semi-supervised learning;
Conference_Titel :
Artificial Intelligence and Signal Processing (AISP), 2015 International Symposium on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4799-8817-4
DOI :
10.1109/AISP.2015.7123504