DocumentCode :
3376899
Title :
A rule-based approach of stemming for inflectional and derivational words in Bengali
Author :
Das, Suprabhat ; Mitra, Pabitra
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, India
fYear :
2011
fDate :
14-16 Jan. 2011
Firstpage :
134
Lastpage :
136
Abstract :
Stemming is the process for reducing inflectional or derived words to its stem or root form. This paper presents an approach for finding out the stems from text in Bengali, which is highly inflectional in nature. In our process, we first stripped off the suffix part from Bengali words using some suffix stripping rules, depending upon the type of suffixes. Then we checked for the validity of the suffix stripped word as root word, using a Bengali dictionary. We have tested the process on Bengali collection of the FIRE 2010 data set with 50 queries using Lucene as the search engine and it gives a quite satisfactory result in terms of recall and MAP value.
Keywords :
dictionaries; natural language processing; search engines; word processing; Bengali dictionary; Bengali words; FIRE 2010 data set; Lucene; MAP value; derived word reduction; inflectional word reduction; rule-based approach; search engine; stemming; suffix stripping rules; Compounds; Computational linguistics; Dictionaries; Fires; Indexes; Search engines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Students' Technology Symposium (TechSym), 2011 IEEE
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4244-8941-1
Type :
conf
DOI :
10.1109/TECHSYM.2011.5783841
Filename :
5783841
Link To Document :
بازگشت