• DocumentCode
    672855
  • Title

    Extended word similarity based clustering on unsupervised PoS induction to improve English-Indonesian statistical machine translation

  • Author

    Sujaini, Herry ; Purwarianti, Ayu ; Arman, Arry Ahkmad ; Kuspriyanto

  • Author_Institution
    Sch. of Electr. Eng. & Inf., Bandung Inst. of Technol., Bandung, Indonesia
  • fYear
    2013
  • fDate
    25-27 Nov. 2013
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    In this paper, we present the unsupervised Part-of-Speech (PoS) induction algorithm to improve translations quality on statistical machine translation. The proposed algorithm is an extension of the algorithm Word-Similarity-Based (WSB) clustering. In the clustering, the similarity between words is measured by its grammatical relation with other words. The grammatical relation is represented as the n-gram relation. We extend the WSB clustering by take into account for the previous words in measuring the grammatical relation. The clustering results are then used in the English-Indonesia statistical machine translation. The experiments were conducted using MOSES as the machine translation decoder, and were evaluated by its BLEU score. Using 14.000 English-Indonesian sentence pairs, the clustering improved the BLEU score of 2.07%.
  • Keywords
    language translation; natural language processing; statistical analysis; unsupervised learning; English-Indonesian statistical machine translation; MOSES; extended word similarity based clustering; grammatical relation; machine translation decoder; unsupervised PoS induction; unsupervised part-of-speech induction algorithm; Accuracy; Clustering algorithms; Computational linguistics; Computational modeling; Equations; Hidden Markov models; Tagging; English-Indonesian; Unsupervised PoS Induction; Word Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
  • Conference_Location
    Gurgaon
  • Type

    conf

  • DOI
    10.1109/ICSDA.2013.6709880
  • Filename
    6709880