• DocumentCode
    3102807
  • Title

    An Experimental Study on Vietnamese POS Tagging

  • Author

    Tran, Oanh Thi ; Le, Cuong Anh ; Ha, Thuy Quang ; Le, Quynh Hoang

  • Author_Institution
    Inf. Syst. Dept., VNUH, Hanoi, Vietnam
  • fYear
    2009
  • fDate
    7-9 Dec. 2009
  • Firstpage
    23
  • Lastpage
    27
  • Abstract
    In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.
  • Keywords
    learning (artificial intelligence); linguistics; natural language processing; Vietnamese POS tagging; machine learning techniques; morpheme based features; natural language processing; part-of-speech tagging; Buildings; Computer science; Educational institutions; Entropy; Hidden Markov models; Information systems; Learning systems; Natural languages; Support vector machines; Tagging; Vietnamese POS tagging; Vietnamese POS-tagged corpus; morpheme-based features; word-based features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing, 2009. IALP '09. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-0-7695-3904-1
  • Type

    conf

  • DOI
    10.1109/IALP.2009.14
  • Filename
    5380795