Title :
Experiments in word-reordering and morphological preprocessing for transducer-based statistical machine translation
Author :
De Gispert, Adrià ; Mariño, José B.
Author_Institution :
TALP Res. Center, Univ. Politecnica de Catalunya, Barcelona, Spain
fDate :
30 Nov.-3 Dec. 2003
Abstract :
Statistical speech translation can be achieved by an integrated search procedure that produces speech recognition and translation at the same time. Based on finite-state transducers and Viterbi search, the approach forces the rewriting of the target language in a set of modified units (with zero, one or more original words) to preserve the monotonicity of the search over the speech signal without changing the word order in the target language. In this paper, we analyse the effect of non-monotonic word alignments in two different Spanish to English translation tasks (speech-aimed small-vocabulary Verbmobil task and text-aimed large-vocabulary European Parliament task), revealing the most frequent cross patterns and experimenting with reordering strategies to improve the transducer probabilities. In addition, some preliminary results are presented on introducing POS-tagging and lemmatization, as well as some preprocessing such as categorization, to help improving the training of the system.
Keywords :
language translation; speech recognition; POS-tagging; Spanish/English translation; Viterbi search; categorization; cross patterns; finite-state transducers; integrated search procedure; integrated speech recognition/translation; lemmatization; morphological preprocessing; nonmonotonic word alignments; search monotonicity; speech-aimed small vocabulary Verbmobil task; text-aimed large-vocabulary European Parliament task; transducer-based statistical machine translation; word-reordering; Acoustic transducers; Equations; Natural languages; Pattern analysis; Speech analysis; Speech processing; Speech recognition; Text recognition; Viterbi algorithm; Vocabulary;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
DOI :
10.1109/ASRU.2003.1318514