• DocumentCode
    1645055
  • Title

    A Named Entity Recognition approach for Albanian

  • Author

    Skenduli, Marjana Prifti ; Biba, Marenglen

  • Author_Institution
    Dept. of Comput. Sci., Univ. of New York in Tirana, Tirana, Albania
  • fYear
    2013
  • Firstpage
    1532
  • Lastpage
    1537
  • Abstract
    Named Entity Recognition (NER) deals with identifying personal, geographical, organizational or other entity types in a raw text. In this paper we propose the first NER model for the Albanian language. Our model is based on the maximum entropy approach. We manually annotate a corpus in the historical and political domains and train the models to generate classifiers that are able to recognize relevant entities in the text. We achieve good performance for precision and recall on the selected domains, despite the scarcity of Albanian corpora and the fact that this paper marks the first NER research for the Albanian language. Experiments demonstrate that the models can be further improved if richer training corpus is provided.
  • Keywords
    information retrieval; natural language processing; pattern classification; text analysis; Albanian corpora; Albanian language; NER model; classifier generation; geographical entity identification; historical domains; manual corpus annotation; maximum entropy approach; model training; named entity recognition approach; organizational entity identification; personal entity identification; political domains; precision; raw text; recall; text entity recognition; training corpus; Buildings; Computational modeling; Entropy; Hidden Markov models; Organizations; Pragmatics; Training; Albanian; machine learning; named entity recognition; natural language processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on
  • Conference_Location
    Mysore
  • Print_ISBN
    978-1-4799-2432-5
  • Type

    conf

  • DOI
    10.1109/ICACCI.2013.6637407
  • Filename
    6637407