Title :
Extended word similarity based clustering on unsupervised PoS induction to improve English-Indonesian statistical machine translation
Author :
Sujaini, Herry ; Purwarianti, Ayu ; Arman, Arry Ahkmad ; Kuspriyanto
Author_Institution :
Sch. of Electr. Eng. & Inf., Bandung Inst. of Technol., Bandung, Indonesia
Abstract :
In this paper, we present the unsupervised Part-of-Speech (PoS) induction algorithm to improve translations quality on statistical machine translation. The proposed algorithm is an extension of the algorithm Word-Similarity-Based (WSB) clustering. In the clustering, the similarity between words is measured by its grammatical relation with other words. The grammatical relation is represented as the n-gram relation. We extend the WSB clustering by take into account for the previous words in measuring the grammatical relation. The clustering results are then used in the English-Indonesia statistical machine translation. The experiments were conducted using MOSES as the machine translation decoder, and were evaluated by its BLEU score. Using 14.000 English-Indonesian sentence pairs, the clustering improved the BLEU score of 2.07%.
Keywords :
language translation; natural language processing; statistical analysis; unsupervised learning; English-Indonesian statistical machine translation; MOSES; extended word similarity based clustering; grammatical relation; machine translation decoder; unsupervised PoS induction; unsupervised part-of-speech induction algorithm; Accuracy; Clustering algorithms; Computational linguistics; Computational modeling; Equations; Hidden Markov models; Tagging; English-Indonesian; Unsupervised PoS Induction; Word Clustering;
Conference_Titel :
Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013 International Conference
Conference_Location :
Gurgaon
DOI :
10.1109/ICSDA.2013.6709880