Title :
Statistical machine translation of systems for Sinhala - Tamil
Author :
Sripirakas, S. ; Weerasinghe, A.R. ; Herath, Dulip L.
Author_Institution :
Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka
fDate :
Sept. 29 2010-Oct. 1 2010
Abstract :
One of the most promising and leading machine translation strategies would be Statistical Translation Approach. Being pertinent even to structurally dissimilar language pairs, it has confirmed its suitability for large text translation. Rising demand is present for automatic translation between Sinhala and Tamil for quite a lot of decades. Statistical approach is the best preference to resolve the unavailability of a machine translation tool for the languages concerned. Because of language similarity, statistical approach could thrive agreeably, exclusive of more concern on linguistic knowledge. A basic translation system has been modelled and implemented in this research, with the preparation of parallel corpora from parliament order papers. This paper demonstrates only the preliminary system runs of the research, devoid of various parameter refinements and actual design and evaluation strategies. Language Model, Translation Model and Decoder Configurations are done consistent with recent literature. To facilitate the improvement of output quality, MERT technique is integrated to tune the decoder. To stay away from sole dependence on BLEU, two other automatic metrics namely TER and NIST are utilised for the evaluation in different aspects. In addition, directions to future research are also recognized and specified for the refinements of this system.
Keywords :
language translation; natural language processing; text analysis; BLEU; MERT technique; Sinhala-Tamil translation; automatic metrics; automatic translation; decoder configuration; language model; language similarity; large text translation; linguistic knowledge; statistical machine translation; structurally dissimilar language pairs; translation model; Data models; Decoding; Hidden Markov models; Measurement; NIST; Training; Tuning; Sinhala; Tamil; machine translation; natural language processing; statistical machine translation;
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2010 International Conference on
Conference_Location :
Colombo
Print_ISBN :
978-1-4244-9041-7
DOI :
10.1109/ICTER.2010.5643268