DocumentCode :
3727182
Title :
Language localisation of Tamil using Statistical Machine Translation
Author :
Y. Achchuthan;K. Sarveswaran
Author_Institution :
Department of Computer Science, University of Jaffna, Sri Lanka
fYear :
2015
Firstpage :
125
Lastpage :
129
Abstract :
Language localisation, where the strings in interface and documentation are translated to a new language, is a rigorous and time consuming task. On the other hand machine translation systems, specifically Statistical Machine Translation (SMT) systems, are successfully used among many language pairs. A few SMT systems have been developed for generic domain; however, there are no systems available to aid localisation yet. This research proposes a new methodology in which language localisation can be done using SMT. This research also identifies suitable parameters on which a SMT aided localisation system could be built. A pilot system is developed and the system is also outlined in this paper. A RESTful API has also been developed to facilitate localisation in remote tools. Several open source software have been translated already to Tamil. Those translated English - Tamil pairs were collected from various language resource files and then cleaned, tokenised and were used to train the system. Another similar system is prepared with data from generic domain apart from the collected technical data. Systems were trained with 2-gram, 3-gram and 4-gram language models that are created using two different language modelling tools namely KenLM and IRSTLM. Then the results were evaluated using BLEU algorithm. Appropriate parameters for setting up SMT system for localisation were identified from the evaluation. The results show that it would be enough to train a system with 3-gram, and the modified BLEU algorithm will give better understanding of the results compare to the original implementation of it. Further KenLM was found to perform better than IRSTM in terms of accuracy of results and the speed of execution.
Keywords :
Google
Publisher :
ieee
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2015 Fifteenth International Conference on
Print_ISBN :
978-1-4673-9440-6
Type :
conf
DOI :
10.1109/ICTER.2015.7377677
Filename :
7377677
Link To Document :
بازگشت