Title :
A Named Entity Recognition approach for Albanian
Author :
Skenduli, Marjana Prifti ; Biba, Marenglen
Author_Institution :
Dept. of Comput. Sci., Univ. of New York in Tirana, Tirana, Albania
Abstract :
Named Entity Recognition (NER) deals with identifying personal, geographical, organizational or other entity types in a raw text. In this paper we propose the first NER model for the Albanian language. Our model is based on the maximum entropy approach. We manually annotate a corpus in the historical and political domains and train the models to generate classifiers that are able to recognize relevant entities in the text. We achieve good performance for precision and recall on the selected domains, despite the scarcity of Albanian corpora and the fact that this paper marks the first NER research for the Albanian language. Experiments demonstrate that the models can be further improved if richer training corpus is provided.
Keywords :
information retrieval; natural language processing; pattern classification; text analysis; Albanian corpora; Albanian language; NER model; classifier generation; geographical entity identification; historical domains; manual corpus annotation; maximum entropy approach; model training; named entity recognition approach; organizational entity identification; personal entity identification; political domains; precision; raw text; recall; text entity recognition; training corpus; Buildings; Computational modeling; Entropy; Hidden Markov models; Organizations; Pragmatics; Training; Albanian; machine learning; named entity recognition; natural language processing;
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on
Conference_Location :
Mysore
Print_ISBN :
978-1-4799-2432-5
DOI :
10.1109/ICACCI.2013.6637407