DocumentCode
1645055
Title
A Named Entity Recognition approach for Albanian
Author
Skenduli, Marjana Prifti ; Biba, Marenglen
Author_Institution
Dept. of Comput. Sci., Univ. of New York in Tirana, Tirana, Albania
fYear
2013
Firstpage
1532
Lastpage
1537
Abstract
Named Entity Recognition (NER) deals with identifying personal, geographical, organizational or other entity types in a raw text. In this paper we propose the first NER model for the Albanian language. Our model is based on the maximum entropy approach. We manually annotate a corpus in the historical and political domains and train the models to generate classifiers that are able to recognize relevant entities in the text. We achieve good performance for precision and recall on the selected domains, despite the scarcity of Albanian corpora and the fact that this paper marks the first NER research for the Albanian language. Experiments demonstrate that the models can be further improved if richer training corpus is provided.
Keywords
information retrieval; natural language processing; pattern classification; text analysis; Albanian corpora; Albanian language; NER model; classifier generation; geographical entity identification; historical domains; manual corpus annotation; maximum entropy approach; model training; named entity recognition approach; organizational entity identification; personal entity identification; political domains; precision; raw text; recall; text entity recognition; training corpus; Buildings; Computational modeling; Entropy; Hidden Markov models; Organizations; Pragmatics; Training; Albanian; machine learning; named entity recognition; natural language processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on
Conference_Location
Mysore
Print_ISBN
978-1-4799-2432-5
Type
conf
DOI
10.1109/ICACCI.2013.6637407
Filename
6637407
Link To Document