DocumentCode
259581
Title
Improving Named Entity Recognition for Morphologically Rich Languages Using Word Embeddings
Author
Demir, Hakan ; Ozgur, Arzucan
Author_Institution
TUBITAK BILGEM, Gebze, Turkey
fYear
2014
fDate
3-6 Dec. 2014
Firstpage
117
Lastpage
122
Abstract
In this paper, we addressed the Named Entity Recognition (NER) problem for morphologically rich languages by employing a semi-supervised learning approach based on neural networks. We adopted a fast unsupervised method for learning continuous vector representations of words, and used these representations along with language independent features to develop a NER system. We evaluated our system for the highly inflectional Turkish and Czech languages. We improved the state-of-the-art F-score obtained for Turkish without using gazetteers by 2.26% and for Czech by 1.53%. Unlike the previous state-of-the-art systems developed for these languages, our system does not make use of any language dependent features. Therefore, we believe it can easily be applied to other morphologically rich languages.
Keywords
natural language processing; neural nets; unsupervised learning; word processing; Czech languages; F-score; NER problem; NER system; Turkish languages; continuous word vector representations; language independent features; morphologically rich languages; named entity recognition problem; neural networks; semisupervised learning approach; unsupervised learning method; word embeddings; Cities and towns; Context; Measurement; Neural networks; Semisupervised learning; Training; Vectors; Czech NER; Named Entity Recognition; Skip-gram; Turkish NER; Word Embeddings;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications (ICMLA), 2014 13th International Conference on
Conference_Location
Detroit, MI
Type
conf
DOI
10.1109/ICMLA.2014.24
Filename
7033101
Link To Document