DocumentCode
3289849
Title
Building a Test Collection for Sorani Kurdish
Author
Esmaili, Kyumars Sheykh ; Eliassi, Donya ; Salavati, Shahin ; Aliabadi, Purya ; Mohammadi, Arash ; Yosefi, Somayeh ; Hakimi, S.
Author_Institution
Nanyang Technol. Univ., Singapore, Singapore
fYear
2013
fDate
27-30 May 2013
Firstpage
1
Lastpage
7
Abstract
Despite having a large number of speakers, Sorani - one of the two principle branches of the Kurdish language - is among the less-resourced languages. This paper reports on the outcomes of a project aimed at providing the essential resources for processing Sorani texts. The primary output of this project is Pewan, the first standard Test Collection to evaluate Sorani Information Retrieval systems. The other language resources that we have constructed in this project are: (i) a light-stemmer, (ii) a list of affixes, and (iii) a list of stopwords. We also used these newly-built resources to study the effectiveness of basic IR strategies on Sorani documents. Our experimental results show that normalization and, to a lesser extent, stemming can greatly improve the performance of Sorani IR systems.
Keywords
information retrieval systems; natural language processing; project management; text analysis; Pewan; Sorani Kurdish language; Sorani information retrieval system evaluation; Sorani text processing; affixes; less-resourced languages; light-stemmer; standard test collection; stopwords; Buildings; Educational institutions; Information retrieval; Morphology; Reliability; Standards; Writing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Systems and Applications (AICCSA), 2013 ACS International Conference on
Conference_Location
Ifrane
ISSN
2161-5322
Type
conf
DOI
10.1109/AICCSA.2013.6616470
Filename
6616470
Link To Document