DocumentCode
3776141
Title
Bangla Parts-of-Speech tagging using Bangla stemmer and rule based analyzer
Author
Md. Nesarul Hoque;Md. Hanif Seddiqui
Author_Institution
Dept. of Computer Science and Engineering, Port City International University, Chittagong, Bangladesh
fYear
2015
Firstpage
440
Lastpage
444
Abstract
Parts-of-Speech (POS) tagging plays vital roles in the field of Natural Language Processing (NLP), such as - machine translation, spell checker, information retrieval, speech processing, emotion analysis and so on. Bangla is a very inflectional language that induces many variants from a single word. Although there is a few POS Tagger in Bangla language, very small of them address the essence of suffices to identify tag of the words. In this regard, we propose an automated POS Tagging system for Bangla language based on word-suffixes. In our system, we use our own stemming technique to retrieve a possible minimum root words and apply rules according to different forms of suffixes. Moreover, we incorporate a Bangla vocabulary that contains more than 45,000 words with their default tag and a patterned based verb-data-set. These facilitate to improve tagging efficiency of Bangla POS Tagger. We experiment our proposed system on a Bangla text corpus. The result shows that our proposed Bangla POS Tagger has outperformed the known related tagging systems.
Keywords
"Dictionaries","Tagging","Hidden Markov models","Natural language processing","Vocabulary","Training data","Speech"
Publisher
ieee
Conference_Titel
Computer and Information Technology (ICCIT), 2015 18th International Conference on
Type
conf
DOI
10.1109/ICCITechn.2015.7488111
Filename
7488111
Link To Document