Title :
A rule based bengali stemmer
Author :
Mahmud, Md Redowan ; Afrin, Mahbuba ; Razzaque, Md Abdur ; Miller, Ellis ; Iwashige, Joel
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Dhaka, Dhaka, Bangladesh
Abstract :
One of the biggest challenges in doing word lookups is to derive the appropriate base word for any given word in Bengali. The basic concept to the solution of the problem is to eliminate inflections from a given word to derive its stem word. Stemmers attempt to reduce a word to its root form using stemming process, which reduces an inflected or derived word to its stem or root form. Existing works in the literature use lookup tables either for stem words or suffixes, increasing the overheads in terms of memory and time. This paper develops a rule-based algorithm that eliminates inflections stepwise without continuously searching for the desired root in the dictionary. To the best of our knowledge, this paper first investigates that, in Bengali morphology, for a large set of inflections, the stems can be computed algorithmically cutting down the inflections step by step. The proposed algorithm is independent of inflected word lengths and our evaluation shows around 88% accuracy.
Keywords :
information retrieval; knowledge based systems; natural language processing; Bengali stemmer; information retrieval system; rule-based algorithm; stemming process; word root form; Accuracy; Classification algorithms; Colon; Databases; Dictionaries; Informatics; US Department of Transportation; Bengali; Inflections; Rule based Stemming; Stem word; Stemmer; Verb-root;
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-1-4799-3078-4
DOI :
10.1109/ICACCI.2014.6968484