Title of article :
MALAY PART OF SPEECH TAGGER: A COMPARATIVE STUDY ON TAGGING TOOLS
Author/Authors :
MOHAMED, HASSAN Universiti Kebangsaan Malaysia - Centre for Artifial Intelligence and Technology (CAIT), Faculty of Information Science Technology - Knowledge Technology Research Group, Malaysia , OMAR, NAZLIA Universiti Kebangsaan Malaysia - Centre for Artifial Intelligence and Technology (CAIT), Faculty of Information Science Technology - Knowledge Technology Research Group, Malaysia , AB. AZIZ, MOHD. JUZAIDDIN Universiti Kebangsaan Malaysia - Centre for Artifial Intelligence and Technology (CAIT), Faculty of Information Science Technology - Knowledge Technology Research Group, Malaysia
From page :
11
To page :
23
Abstract :
Malay language is an agglutinative language which rich morphology. Affixation to a root word is the most common morphological processes used to derive a new word for other meaning that would affect the change in their part of speech (POS). Malay annotated corpus is not freely available, so there is no publication report on the comparison of the performance of POS tagging using Hidden Markov Model (HMM), Maximum Entropy (ME) and Support Vector Machine (SVM), especially to look into the effect of Malay morphology for tagging unknown words. This paper aims to present the evaluation of TnT using HMM, MaxEnt using ME and SVMTool using SVM. In order to train and test such methods in tagging Malay language, efforts has been taken to annotate the Malay corpus in health domain. Modifications has been done to TnT to fit in prefix and circumfix features. The results of the experiments shows that SVMTool outperforms TnT and MaxEnt for overall accuracy (99.23% for SVMTool, 94% for TnT and 96% for Maxent) and tagging unknown words accuracy (96.78% for SVMTool, 67% for TnT and 86.23% for MaxEnt ). MaxEnt outperforms TnT for the overall accuracy and tagging unknown words. As the tagging accuracy of SVMTool to unknown word succeeds 96.78%, it would be the best tool for tagging Malay language for a specific domain
Keywords :
Malay POS tagger , Malay morphemes , unknown words
Journal title :
Asia-Pacific Journal Of Information Technology an‎d Multimedia
Journal title :
Asia-Pacific Journal Of Information Technology an‎d Multimedia
Record number :
2699021
Link To Document :
بازگشت