DocumentCode :
3776141
Title :
Bangla Parts-of-Speech tagging using Bangla stemmer and rule based analyzer
Author :
Md. Nesarul Hoque;Md. Hanif Seddiqui
Author_Institution :
Dept. of Computer Science and Engineering, Port City International University, Chittagong, Bangladesh
fYear :
2015
Firstpage :
440
Lastpage :
444
Abstract :
Parts-of-Speech (POS) tagging plays vital roles in the field of Natural Language Processing (NLP), such as - machine translation, spell checker, information retrieval, speech processing, emotion analysis and so on. Bangla is a very inflectional language that induces many variants from a single word. Although there is a few POS Tagger in Bangla language, very small of them address the essence of suffices to identify tag of the words. In this regard, we propose an automated POS Tagging system for Bangla language based on word-suffixes. In our system, we use our own stemming technique to retrieve a possible minimum root words and apply rules according to different forms of suffixes. Moreover, we incorporate a Bangla vocabulary that contains more than 45,000 words with their default tag and a patterned based verb-data-set. These facilitate to improve tagging efficiency of Bangla POS Tagger. We experiment our proposed system on a Bangla text corpus. The result shows that our proposed Bangla POS Tagger has outperformed the known related tagging systems.
Keywords :
"Dictionaries","Tagging","Hidden Markov models","Natural language processing","Vocabulary","Training data","Speech"
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (ICCIT), 2015 18th International Conference on
Type :
conf
DOI :
10.1109/ICCITechn.2015.7488111
Filename :
7488111
Link To Document :
بازگشت