Title :
Better statistical estimation can benefit all phrases in phrase-based statistical machine translation
Author :
Sima´an, Khalil ; Mylonakis, Markos
Author_Institution :
Inst. for Logic, Univ. of Amsterdam, Amsterdam
Abstract :
The heuristic estimates of conditional phrase translation probabilities are based on frequency counts in a word-aligned parallel corpus. Earlier attempts at more principled estimation using Expectation-Maximization (EM) under perform this heuristic. This paper shows that a recently introduced novel estimator based on smoothing might provide a good alternative. When all phrase pairs are estimated (no length cut-off), this estimator slightly outperforms the heuristic estimator.
Keywords :
expectation-maximisation algorithm; language translation; smoothing methods; conditional phrase translation probabilities; expectation-maximization; phrase-based statistical machine translation; smoothing methods; statistical estimation; word-aligned parallel corpus; Concurrent computing; Containers; Data mining; Frequency estimation; Logic; Parameter estimation; Probability; Smoothing methods; State estimation; Training data; Parameter Estimation; Smoothing Methods; Transduction;
Conference_Titel :
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location :
Goa
Print_ISBN :
978-1-4244-3471-8
Electronic_ISBN :
978-1-4244-3472-5
DOI :
10.1109/SLT.2008.4777884