• DocumentCode
    2789878
  • Title

    A Hybrid Approach to Vietnamese Word Segmentation Using Part of Speech Tags

  • Author

    Pham, Dang Duc ; Tran, Giang Binh ; Pham, Son Bao

  • Author_Institution
    Human Machine Interaction Lab., Vietnam Nat. Univ., Hanoi, Vietnam
  • fYear
    2009
  • fDate
    13-17 Oct. 2009
  • Firstpage
    154
  • Lastpage
    161
  • Abstract
    Word segmentation is one of the most important tasks in NLP. This task, within Vietnamese language and its own features, faces some challenges, especially in words boundary determination. To tackle the task of Vietnamese word segmentation, in this paper, we propose the WS4VN system that uses a new approach based on Maximum matching algorithm combining with stochastic models using part-of-speech information. The approach can resolve word ambiguity and choose the best segmentation for each input sentence. Our system gives a promising result with an F-measure of 97%, higher than the results of existing publicly available Vietnamese word segmentation systems.
  • Keywords
    natural language processing; speech recognition; Vietnamese word segmentation; WS4VN system; maximum matching algorithm; speech information; speech tags; stochastic models; Educational institutions; Humans; Information technology; Knowledge engineering; Laboratories; Natural languages; Speech; Systems engineering and theory; Tagging; Tin; Word segmentation; natural language processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4244-5086-2
  • Electronic_ISBN
    978-0-7695-3846-4
  • Type

    conf

  • DOI
    10.1109/KSE.2009.44
  • Filename
    5361713