DocumentCode :
264055
Title :
A study of certain morphological structures of Kazakh and their impact on the machine translation quality
Author :
Bekbulatov, Eldar ; Kartbayev, Amandyk
Author_Institution :
Lab. of Intell. Inf. Syst., Farabi Kazakh Nat. Univ., Almaty, Kazakhstan
fYear :
2014
fDate :
15-17 Oct. 2014
Firstpage :
1
Lastpage :
5
Abstract :
This paper describes a morphological analysis of the Kazakh language for Kazakh-English statistical machine translation through changing the compound words of Kazakh language, and explores the effect of using the modified input on translation quality with a large number of training sentences. Word alignment problem would become more serious for translation from morphologically rich language such as Kazakh to morphologically simple one such as English, due to the problem of data sparseness on translation word forms in many different morphological variants. We present our investigations on unsupervised Kazakh morphological segmentation over newspaper corpus and compare unsupervised segmentation against rule-based language processing tools. In our experiments, the results show that our proposed method can improve word alignment and translation quality.
Keywords :
computational linguistics; language translation; natural language processing; word processing; Kazakh language; Kazakh-English statistical machine translation; compound word; data sparseness; machine translation quality; morphological analysis; morphological structure; newspaper corpus; rule-based language processing tool; training sentence; translation word; unsupervised Kazakh morphological segmentation; unsupervised segmentation; word alignment problem; Automata; Instruments; Morphology; Pragmatics; Smoothing methods; Training; computational linguistics; kazakh morphology; machine translation; word segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Application of Information and Communication Technologies (AICT), 2014 IEEE 8th International Conference on
Conference_Location :
Astana
Print_ISBN :
978-1-4799-4120-9
Type :
conf
DOI :
10.1109/ICAICT.2014.7036013
Filename :
7036013
Link To Document :
بازگشت