DocumentCode :
618741
Title :
A comparative study on different techniques for Thai part-of-speech tagging
Author :
Pailai, Jaruwat ; Kongkachandra, Rachada ; Supnithi, Thepchai ; Boonkwan, Prachya
Author_Institution :
Dept. of Comput. Sci., Thammasat Univ., Pathumthani, Thailand
fYear :
2013
fDate :
15-17 May 2013
Firstpage :
1
Lastpage :
5
Abstract :
The natural language processing (NLP) for Thai language is rather complicated using in the real tasks because it has a complex sequential structure of the sentence. The POS tagging can improve the accuracy of syntactic analysis so it can support the improvement of many NLP tasks. We present the supervised machine learning that is suitable for annotate the POS type for Thai language by comparison between the Support Vector Machine (SVM) and the Conditional Random Fields (CRFs). The BEST 2012 News and Entertainments corpus is utilized in our experiments. However, the sequential characteristic of Thai language is the interesting point and we use it as our feature in training set. Our sequential features contain forward 3-gram, backward 3-gram and 5-gram. The best accuracy of our experiments is 93.638% from SVMs POS tagging that learning by word of forward 3-gram when the size of training data is ten thousand tokens. Moreover, with the same training data, the best accuracy of CRFs is very close with SVM that is 93.254% when the learning form is the word with POS of 5-gram.
Keywords :
identification technology; learning (artificial intelligence); natural language processing; support vector machines; CRF; NLP; SVM POS tagging; Thai language; Thai part-of-speech tagging; complex sequential structure; conditional random fields; natural language processing; supervised machine learning; support vector machine; syntactic analysis; Accuracy; Hidden Markov models; Natural language processing; Speech; Speech processing; Support vector machines; Tagging; Conditional Random Fields (CRFs); N-Gram; Natural Language Processing (NLP); Support Vector Machine (SVM); Thai Part of Speech Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2013 10th International Conference on
Conference_Location :
Krabi
Print_ISBN :
978-1-4799-0546-1
Type :
conf
DOI :
10.1109/ECTICon.2013.6559527
Filename :
6559527
Link To Document :
بازگشت