• DocumentCode
    259581
  • Title

    Improving Named Entity Recognition for Morphologically Rich Languages Using Word Embeddings

  • Author

    Demir, Hakan ; Ozgur, Arzucan

  • Author_Institution
    TUBITAK BILGEM, Gebze, Turkey
  • fYear
    2014
  • fDate
    3-6 Dec. 2014
  • Firstpage
    117
  • Lastpage
    122
  • Abstract
    In this paper, we addressed the Named Entity Recognition (NER) problem for morphologically rich languages by employing a semi-supervised learning approach based on neural networks. We adopted a fast unsupervised method for learning continuous vector representations of words, and used these representations along with language independent features to develop a NER system. We evaluated our system for the highly inflectional Turkish and Czech languages. We improved the state-of-the-art F-score obtained for Turkish without using gazetteers by 2.26% and for Czech by 1.53%. Unlike the previous state-of-the-art systems developed for these languages, our system does not make use of any language dependent features. Therefore, we believe it can easily be applied to other morphologically rich languages.
  • Keywords
    natural language processing; neural nets; unsupervised learning; word processing; Czech languages; F-score; NER problem; NER system; Turkish languages; continuous word vector representations; language independent features; morphologically rich languages; named entity recognition problem; neural networks; semisupervised learning approach; unsupervised learning method; word embeddings; Cities and towns; Context; Measurement; Neural networks; Semisupervised learning; Training; Vectors; Czech NER; Named Entity Recognition; Skip-gram; Turkish NER; Word Embeddings;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2014 13th International Conference on
  • Conference_Location
    Detroit, MI
  • Type

    conf

  • DOI
    10.1109/ICMLA.2014.24
  • Filename
    7033101