مرکز منطقه ای اطلاع رساني علوم و فناوري - NLTK tagger for Albanian using iterative approach

DocumentCode :

642356

Title :

NLTK tagger for Albanian using iterative approach

Author :

Kadriu, A.

Author_Institution :

South East Eur. Univ., Tetove, Macedonia

fYear :

2013

fDate :

24-27 June 2013

Firstpage :

283

Lastpage :

288

Abstract :

This paper presents a research done about a model of tagging for Albanian texts, using the NLTK toolkit. The model uses cascading of three taggers with backoff. We use a dictionary of around 32000 words, together their correspondent POS tags and a set of regular expressions rules too. A lemmatize module is implemented in order to convert nouns and verbs to their lemma. The text is tagged initially with a unigram tagger based on the dictionary. This is used as a baseline tagger for a regular expressions tagger. A correction is made for not correct lemmatized words, creating a third lookup tagger. This tagger will be used with the first and second tagger as backoff.

Keywords :

dictionaries; iterative methods; natural language processing; text analysis; Albanian language; Albanian text; NLTK tagger; NLTK toolkit; POS tags; dictionary; iterative approach; lemmatize module; lemmatized words; lookup tagger; nouns; regular expressions rules; regular expressions tagger; taggers cascading; tagging model; text tagging; unigram tagger; verbs; Accuracy; Dictionaries; Economics; Hidden Markov models; Mood; Tagging; Training; Albanian language; NLTK; POS tagging;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Technology Interfaces (ITI), Proceedings of the ITI 2013 35th International Conference on

Conference_Location :

Cavtat

ISSN :

1334-2762

Print_ISBN :

978-953-7138-30-1

Type :

conf

DOI :

10.2498/iti.2013.0565

Filename :

6649039

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=642356