• DocumentCode
    1908723
  • Title

    Machine Translation Approach for Vietnamese Diacritic Restoration

  • Author

    Thi Ngoc Diep Do ; Duy Binh Nguyen ; Dang Khoa Mae ; Do Dat Tran

  • Author_Institution
    MICA Inst., Hanoi Univ. of Sci. & Technol., Hanoi, Vietnam
  • fYear
    2013
  • fDate
    17-19 Aug. 2013
  • Firstpage
    103
  • Lastpage
    106
  • Abstract
    The diacritic marks exist in many languages such as French, German, Slovak, Vietnamese, etc. However for some reasons, sometime they are omitted in writing. This phenomenon may lead to the ambiguity for reader when reading a non-diacritic text. The automatic diacritic restoration problem has been proposed and resolved in several languages using the character-based approach, word-based approach, point-wise approach, etc. However, these approaches lean heavily on the linguistics information, size of training corpus and sometime they are language dependent. In this paper, a simple and effective restoration method will be presented. The machine translation approach will be used as a new solution for this problem. The restoration method has been applied for Vietnamese language, and integrated in an Android application named VIVA (Vietnamese Voice Assistant) that reads out the content of incoming text messages on mobile phone. Our experiments show that the proposed restoration method can recover diacritic marks with a 99.0% accuracy rate.
  • Keywords
    language translation; natural language processing; Android application; VIVA; Vietnamese Voice Assistant; Vietnamese diacritic restoration; Vietnamese language; automatic diacritic restoration problem; diacritic mark; linguistics information; machine translation; Accuracy; Smart phones; Speech; Training; Training data; Writing; diacritics restoration; statistical machine translation; text message; vietnamese;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2013 International Conference on
  • Conference_Location
    Urumqi
  • Type

    conf

  • DOI
    10.1109/IALP.2013.30
  • Filename
    6646014