DocumentCode
2500908
Title
Thai named entity recognition based on conditional random fields
Author
Tirasaroj, Nutcha ; Aroonmanakun, Wirote
Author_Institution
Dept. of Linguistics, Chulalongkorn Univ., Bangkok, Thailand
fYear
2009
fDate
20-22 Oct. 2009
Firstpage
216
Lastpage
220
Abstract
This paper presents the Thai named entity recognition (NER) systems using Conditional Random Fields (CRFs). In the previous studies of Thai NER, there are not any systems using syllable-segmented data as an input but word-segmented one. Since the results of some researches on NER in other languages such as Chinese show that the systems based on character are better than those based on word, this study is also conducted to find out if the syllable-segmented input helps improve Thai NER. In order to compare the system getting word-segmented input to that getting syllable-segmented input, there will be two sets of features used in the systems in this study. The results of the experiment show that the systems do not perform well enough due to few features used. However, it reveals that the syllable-based system is slightly better than the word-based one. The corpus, training data preparation and system overview are also included in this paper.
Keywords
data handling; natural language processing; random processes; Thai named entity recognition; conditional random field; natural language processing; syllable-segmented input; word-segmented data; Art; Data mining; Entropy; Graphical models; Labeling; Machine learning; Natural language processing; Natural languages; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on
Conference_Location
Bangkok
Print_ISBN
978-1-4244-4138-9
Electronic_ISBN
978-1-4244-4139-6
Type
conf
DOI
10.1109/SNLP.2009.5340913
Filename
5340913
Link To Document