DocumentCode
1625618
Title
A pool-based active learning method for improving Farsi-English Machine Translation system
Author
Bakhshaei, Somayeh ; Khadivi, Shahram
Author_Institution
Comput. Sci. & Inf. Theor. Dept., Amirkabir Univ. of Technol., Tehran, Iran
fYear
2012
Firstpage
822
Lastpage
826
Abstract
In this paper we try to alleviate the problem of scares resources for developing Farsi-English Statistical Machine Translation system (SMT). It is done by applying Active Learning (AL) idea to choose more informative sentences to be translated by a human and then be added to the base-line corpus. While using the human translations is worthless in compare to the other approaches of corpus gathering (like automatic approaches), it is more costly too. So, in this way we can improve the translation system with less cost. This is done in intricate to human translator. Applying Active learning idea to a SMT system, changes it to a system which can improve its based-line corpus by asking for the essential data which directly leads to the system improvement. On the other hand, combination of AL idea with SMT is a way of using source side monolingual resources for improving SMT systems which is ignored in the original theory of SMT. Our results for Farsi-English system shows improvement in compare to random sentence selection.
Keywords
language translation; learning (artificial intelligence); Farsi-English machine translation system; SMT system; base-line corpus; human translations; human translator; informative sentences; pool-based active learning method; random sentence selection; source side monolingual resources; Current measurement; Data models; Face; Feature extraction; Mathematical model; Uncertainty; Active Learning; Farsi-English SMT; Persian language; Scarece resources;
fLanguage
English
Publisher
ieee
Conference_Titel
Telecommunications (IST), 2012 Sixth International Symposium on
Conference_Location
Tehran
Print_ISBN
978-1-4673-2072-6
Type
conf
DOI
10.1109/ISTEL.2012.6483099
Filename
6483099
Link To Document