DocumentCode :
2243044
Title :
Kernel based part of speech tagger for Kannada
Author :
Antony, P.J. ; Soman, K.P.
Author_Institution :
Comput. Eng. & Networking, Amrita Univ., Coimbatore, India
Volume :
4
fYear :
2010
fDate :
11-14 July 2010
Firstpage :
2139
Lastpage :
2144
Abstract :
The proposed paper presents the development of a part-of-speech tagger for Kannada language that can be used for analyzing and annotating Kannada texts. POS tagging is considered as one of the basic tool and component necessary for many Natural Language Processing (NLP) applications like speech recognition, natural language parsing, information retrieval and information extraction of a given language. In order to alleviate problems for Kannada language, we proposed a new machine learning POS tagger approach. Identifying the ambiguities in Kannada lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. We have developed our own tagset which consist of 30 tags and built a part-of-speech Tagger for Kannada Language using Support Vector Machine (SVM). A corpus of texts, extracted from Kannada news papers and books, is manually morphologically analyzed and tagged using our developed tagset. The performance of the system is evaluated and we found that the result obtained was more efficient and accurate compared with earlier methods for Kannada POS tagging.
Keywords :
learning (artificial intelligence); natural language processing; support vector machines; Kannada language; Kannada lexical items; machine learning; natural language processing; part-of-speech tagger; support vector machine; Artificial neural networks; Classification algorithms; Context; Machine learning; Support vector machines; Tagging; Training; Classification; Kannada; NLP; POS Tagger; Support Vector Machine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
Type :
conf
DOI :
10.1109/ICMLC.2010.5580488
Filename :
5580488
Link To Document :
بازگشت