• DocumentCode
    2500908
  • Title

    Thai named entity recognition based on conditional random fields

  • Author

    Tirasaroj, Nutcha ; Aroonmanakun, Wirote

  • Author_Institution
    Dept. of Linguistics, Chulalongkorn Univ., Bangkok, Thailand
  • fYear
    2009
  • fDate
    20-22 Oct. 2009
  • Firstpage
    216
  • Lastpage
    220
  • Abstract
    This paper presents the Thai named entity recognition (NER) systems using Conditional Random Fields (CRFs). In the previous studies of Thai NER, there are not any systems using syllable-segmented data as an input but word-segmented one. Since the results of some researches on NER in other languages such as Chinese show that the systems based on character are better than those based on word, this study is also conducted to find out if the syllable-segmented input helps improve Thai NER. In order to compare the system getting word-segmented input to that getting syllable-segmented input, there will be two sets of features used in the systems in this study. The results of the experiment show that the systems do not perform well enough due to few features used. However, it reveals that the syllable-based system is slightly better than the word-based one. The corpus, training data preparation and system overview are also included in this paper.
  • Keywords
    data handling; natural language processing; random processes; Thai named entity recognition; conditional random field; natural language processing; syllable-segmented input; word-segmented data; Art; Data mining; Entropy; Graphical models; Labeling; Machine learning; Natural language processing; Natural languages; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing, 2009. SNLP '09. Eighth International Symposium on
  • Conference_Location
    Bangkok
  • Print_ISBN
    978-1-4244-4138-9
  • Electronic_ISBN
    978-1-4244-4139-6
  • Type

    conf

  • DOI
    10.1109/SNLP.2009.5340913
  • Filename
    5340913