مرکز منطقه ای اطلاع رساني علوم و فناوري - Factored phrase-based statistical machine translation

DocumentCode :

3632198

Title :

Factored phrase-based statistical machine translation

Author :

Dan Tufis;Alexandru Ceausu

Author_Institution :

Research Institute for Artificial Intelligence, Romanian Academy, Bucharest, Romania

fYear :

2009

Firstpage :

Lastpage :

Abstract :

We describe the results of a short-term SEE-ERAnet project the aim of which was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages. The major tasks of the project were: compilation of a multilingual parallel corpus for the concerned languages, the XML mark-up of the corpus (tokenization, lemmatization, tagging), the sentence and word alignment of the corpus and the building of the statistical translation models. Additionally, based on the created resources and models, we conducted preliminary experiments on building prototype MT systems for Romanian ≪-≫ English, Greek ≪-≫ English and Slovene ≪-≫ English. We argue that by investing efforts in building accurate language resources, larger the better, as well as in fine-tuning of the statistical parameters, the current machine-learning technologies can be successfully used for a quick development of acceptable MT prototypes, valuable starting points in implementing working systems. We substantiate this claim with recent results from a follow-up national project, aiming at the development of a Romanian≪-≫ English translation system.

Keywords :

"Natural languages","XML","Tagging","Prototypes","Artificial intelligence","Research and development","Decoding","Training data","Learning systems"

Publisher :

ieee

Conference_Titel :

Speech Technology and Human-Computer Dialogue, 2009. SpeD ´09. Proceedings of the 5-th Conference on

Print_ISBN :

978-1-4244-4727-5

Type :

conf

DOI :

10.1109/SPED.2009.5156180

Filename :

5156180

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3632198