• DocumentCode
    166208
  • Title

    A rule based bengali stemmer

  • Author

    Mahmud, Md Redowan ; Afrin, Mahbuba ; Razzaque, Md Abdur ; Miller, Ellis ; Iwashige, Joel

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Dhaka, Dhaka, Bangladesh
  • fYear
    2014
  • fDate
    24-27 Sept. 2014
  • Firstpage
    2750
  • Lastpage
    2756
  • Abstract
    One of the biggest challenges in doing word lookups is to derive the appropriate base word for any given word in Bengali. The basic concept to the solution of the problem is to eliminate inflections from a given word to derive its stem word. Stemmers attempt to reduce a word to its root form using stemming process, which reduces an inflected or derived word to its stem or root form. Existing works in the literature use lookup tables either for stem words or suffixes, increasing the overheads in terms of memory and time. This paper develops a rule-based algorithm that eliminates inflections stepwise without continuously searching for the desired root in the dictionary. To the best of our knowledge, this paper first investigates that, in Bengali morphology, for a large set of inflections, the stems can be computed algorithmically cutting down the inflections step by step. The proposed algorithm is independent of inflected word lengths and our evaluation shows around 88% accuracy.
  • Keywords
    information retrieval; knowledge based systems; natural language processing; Bengali stemmer; information retrieval system; rule-based algorithm; stemming process; word root form; Accuracy; Classification algorithms; Colon; Databases; Dictionaries; Informatics; US Department of Transportation; Bengali; Inflections; Rule based Stemming; Stem word; Stemmer; Verb-root;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
  • Conference_Location
    New Delhi
  • Print_ISBN
    978-1-4799-3078-4
  • Type

    conf

  • DOI
    10.1109/ICACCI.2014.6968484
  • Filename
    6968484