Bangla Parts-of-Speech tagging using Bangla stemmer and rule based analyzer

Author

Md. Nesarul Hoque;Md. Hanif Seddiqui

Author_Institution

Dept. of Computer Science and Engineering, Port City International University, Chittagong, Bangladesh

fYear

2015

Firstpage

440

Lastpage

444

Abstract

Parts-of-Speech (POS) tagging plays vital roles in the field of Natural Language Processing (NLP), such as - machine translation, spell checker, information retrieval, speech processing, emotion analysis and so on. Bangla is a very inflectional language that induces many variants from a single word. Although there is a few POS Tagger in Bangla language, very small of them address the essence of suffices to identify tag of the words. In this regard, we propose an automated POS Tagging system for Bangla language based on word-suffixes. In our system, we use our own stemming technique to retrieve a possible minimum root words and apply rules according to different forms of suffixes. Moreover, we incorporate a Bangla vocabulary that contains more than 45,000 words with their default tag and a patterned based verb-data-set. These facilitate to improve tagging efficiency of Bangla POS Tagger. We experiment our proposed system on a Bangla text corpus. The result shows that our proposed Bangla POS Tagger has outperformed the known related tagging systems.

Keywords

"Dictionaries","Tagging","Hidden Markov models","Natural language processing","Vocabulary","Training data","Speech"

Publisher

ieee

Conference_Titel

Computer and Information Technology (ICCIT), 2015 18th International Conference on

Type

conf

DOI

10.1109/ICCITechn.2015.7488111

Filename

7488111