DocumentCode :
259581
Title :
Improving Named Entity Recognition for Morphologically Rich Languages Using Word Embeddings
Author :
Demir, Hakan ; Ozgur, Arzucan
Author_Institution :
TUBITAK BILGEM, Gebze, Turkey
fYear :
2014
fDate :
3-6 Dec. 2014
Firstpage :
117
Lastpage :
122
Abstract :
In this paper, we addressed the Named Entity Recognition (NER) problem for morphologically rich languages by employing a semi-supervised learning approach based on neural networks. We adopted a fast unsupervised method for learning continuous vector representations of words, and used these representations along with language independent features to develop a NER system. We evaluated our system for the highly inflectional Turkish and Czech languages. We improved the state-of-the-art F-score obtained for Turkish without using gazetteers by 2.26% and for Czech by 1.53%. Unlike the previous state-of-the-art systems developed for these languages, our system does not make use of any language dependent features. Therefore, we believe it can easily be applied to other morphologically rich languages.
Keywords :
natural language processing; neural nets; unsupervised learning; word processing; Czech languages; F-score; NER problem; NER system; Turkish languages; continuous word vector representations; language independent features; morphologically rich languages; named entity recognition problem; neural networks; semisupervised learning approach; unsupervised learning method; word embeddings; Cities and towns; Context; Measurement; Neural networks; Semisupervised learning; Training; Vectors; Czech NER; Named Entity Recognition; Skip-gram; Turkish NER; Word Embeddings;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2014 13th International Conference on
Conference_Location :
Detroit, MI
Type :
conf
DOI :
10.1109/ICMLA.2014.24
Filename :
7033101
Link To Document :
بازگشت