• DocumentCode
    3376899
  • Title

    A rule-based approach of stemming for inflectional and derivational words in Bengali

  • Author

    Das, Suprabhat ; Mitra, Pabitra

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, India
  • fYear
    2011
  • fDate
    14-16 Jan. 2011
  • Firstpage
    134
  • Lastpage
    136
  • Abstract
    Stemming is the process for reducing inflectional or derived words to its stem or root form. This paper presents an approach for finding out the stems from text in Bengali, which is highly inflectional in nature. In our process, we first stripped off the suffix part from Bengali words using some suffix stripping rules, depending upon the type of suffixes. Then we checked for the validity of the suffix stripped word as root word, using a Bengali dictionary. We have tested the process on Bengali collection of the FIRE 2010 data set with 50 queries using Lucene as the search engine and it gives a quite satisfactory result in terms of recall and MAP value.
  • Keywords
    dictionaries; natural language processing; search engines; word processing; Bengali dictionary; Bengali words; FIRE 2010 data set; Lucene; MAP value; derived word reduction; inflectional word reduction; rule-based approach; search engine; stemming; suffix stripping rules; Compounds; Computational linguistics; Dictionaries; Fires; Indexes; Search engines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Students' Technology Symposium (TechSym), 2011 IEEE
  • Conference_Location
    Kharagpur
  • Print_ISBN
    978-1-4244-8941-1
  • Type

    conf

  • DOI
    10.1109/TECHSYM.2011.5783841
  • Filename
    5783841