DocumentCode :
573565
Title :
State-of-the-art English to Persian Statistical Machine Translation system
Author :
Mansouri, Amin ; Faili, Heshaam
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
fYear :
2012
fDate :
2-3 May 2012
Firstpage :
174
Lastpage :
179
Abstract :
Comparison of several kinds of English-Persian Statistical Machine Translation systems is reported in this paper. A large parallel corpus containing about 6 million tokens on each side has been developed for training the proposed SMT system. In development of the parallel corpus, a noisy filtering system based on MaxEnt classifier has been innovated to distinguish between correct and incorrect sentence pairs. By using the generated parallel corpus, a variety of SMT systems on English to Persian languages has been developed. Several variations on SMT, such as hybrid MT or statistical post editing MT has been proposed in this paper. The whole systems were tested on two different types of test set, one extracted randomly from parallel corpus and the other containing formal English sentences extracted from English learning book. The results shows hybrid system of SMT augmented by a rule based detection of English phrasal verb and Persian compound verb improves the baseline significantly. Also, state-of-the-art results on English-Persian translation are obtained by Verb-aware SMT with respect to BLEU measure.
Keywords :
knowledge based systems; language translation; natural language processing; pattern classification; statistical analysis; BLEU measure; English learning book; English phrasal verb; English-Persian statistical machine translation system; MaxEnt classifier; Persian compound verb; SMT system; hybrid MT system; noisy filtering system; parallel corpus; rule based detection; sentence pairs; statistical post editing MT system; verb-aware SMT system; Compounds; Feature extraction; Filtering; Google; Noise measurement; Training; Hybrid Machine Translation; MaxEnt Classifier; Parallel Corpus; Statistical Machine Translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Artificial Intelligence and Signal Processing (AISP), 2012 16th CSI International Symposium on
Conference_Location :
Shiraz, Fars
Print_ISBN :
978-1-4673-1478-7
Type :
conf
DOI :
10.1109/AISP.2012.6313739
Filename :
6313739
Link To Document :
بازگشت