DocumentCode :
735887
Title :
Word sense disambiguation in Bengali: A lemmatized system increases the accuracy of the result
Author :
Pal, Alok Ranjan ; Saha, Diganta ; Naskar, Sudip ; Dash, Niladri Sekhar
Author_Institution :
Detp. of Comput. Sc. & Eng., Coll. of Eng. & Mgmt, Kolaghat, India
fYear :
2015
fDate :
9-11 July 2015
Firstpage :
342
Lastpage :
346
Abstract :
In the proposed approach, an attempt was made to disambiguate Bengali ambiguous words using Naïve Bayes Classification algorithm. The whole task was divided into two modules. Each module executes a specific task. In the first module, the algorithm was applied on a regular text, collected from the Bengali text corpus developed in the TDIL project of the Govt. of India and the accuracy of disambiguation process was obtained around 80%. In the second module, the whole training data and the test data were lemmatized and applying the same algorithm, around 85% accurate result was obtained. The output was verified with a previously tagged output file, generated with the help of a Bengali lexical dictionary. The implicational relevance of this study was attested in automatic text classification, machine learning, information extraction, and word sense disambiguation.
Keywords :
Bayes methods; natural language processing; pattern classification; text analysis; Bengali lexical dictionary; Bengali text corpus; India; TDIL project; automatic text classification; information extraction; lemmatized system; machine learning; naïve Bayes classification algorithm; word sense disambiguation; Accuracy; Classification algorithms; Computers; Context; Dictionaries; Pragmatics; Probabilistic logic; Bengali Word Sense Disambiguation; Bengali WordNet; Naïve Bayes Classification; Natural Language Processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
Type :
conf
DOI :
10.1109/ReTIS.2015.7232902
Filename :
7232902
Link To Document :
بازگشت