DocumentCode
166208
Title
A rule based bengali stemmer
Author
Mahmud, Md Redowan ; Afrin, Mahbuba ; Razzaque, Md Abdur ; Miller, Ellis ; Iwashige, Joel
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Dhaka, Dhaka, Bangladesh
fYear
2014
fDate
24-27 Sept. 2014
Firstpage
2750
Lastpage
2756
Abstract
One of the biggest challenges in doing word lookups is to derive the appropriate base word for any given word in Bengali. The basic concept to the solution of the problem is to eliminate inflections from a given word to derive its stem word. Stemmers attempt to reduce a word to its root form using stemming process, which reduces an inflected or derived word to its stem or root form. Existing works in the literature use lookup tables either for stem words or suffixes, increasing the overheads in terms of memory and time. This paper develops a rule-based algorithm that eliminates inflections stepwise without continuously searching for the desired root in the dictionary. To the best of our knowledge, this paper first investigates that, in Bengali morphology, for a large set of inflections, the stems can be computed algorithmically cutting down the inflections step by step. The proposed algorithm is independent of inflected word lengths and our evaluation shows around 88% accuracy.
Keywords
information retrieval; knowledge based systems; natural language processing; Bengali stemmer; information retrieval system; rule-based algorithm; stemming process; word root form; Accuracy; Classification algorithms; Colon; Databases; Dictionaries; Informatics; US Department of Transportation; Bengali; Inflections; Rule based Stemming; Stem word; Stemmer; Verb-root;
fLanguage
English
Publisher
ieee
Conference_Titel
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location
New Delhi
Print_ISBN
978-1-4799-3078-4
Type
conf
DOI
10.1109/ICACCI.2014.6968484
Filename
6968484
Link To Document