• DocumentCode
    3289849
  • Title

    Building a Test Collection for Sorani Kurdish

  • Author

    Esmaili, Kyumars Sheykh ; Eliassi, Donya ; Salavati, Shahin ; Aliabadi, Purya ; Mohammadi, Arash ; Yosefi, Somayeh ; Hakimi, S.

  • Author_Institution
    Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2013
  • fDate
    27-30 May 2013
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Despite having a large number of speakers, Sorani - one of the two principle branches of the Kurdish language - is among the less-resourced languages. This paper reports on the outcomes of a project aimed at providing the essential resources for processing Sorani texts. The primary output of this project is Pewan, the first standard Test Collection to evaluate Sorani Information Retrieval systems. The other language resources that we have constructed in this project are: (i) a light-stemmer, (ii) a list of affixes, and (iii) a list of stopwords. We also used these newly-built resources to study the effectiveness of basic IR strategies on Sorani documents. Our experimental results show that normalization and, to a lesser extent, stemming can greatly improve the performance of Sorani IR systems.
  • Keywords
    information retrieval systems; natural language processing; project management; text analysis; Pewan; Sorani Kurdish language; Sorani information retrieval system evaluation; Sorani text processing; affixes; less-resourced languages; light-stemmer; standard test collection; stopwords; Buildings; Educational institutions; Information retrieval; Morphology; Reliability; Standards; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Systems and Applications (AICCSA), 2013 ACS International Conference on
  • Conference_Location
    Ifrane
  • ISSN
    2161-5322
  • Type

    conf

  • DOI
    10.1109/AICCSA.2013.6616470
  • Filename
    6616470