Title :
Sentence segmentation and punctuation recovery for spoken language translation
Author :
Paulik, Matthias ; Rao, Sharath ; Lane, Ian ; Vogel, Stephan ; Schultz, Tanja
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA
fDate :
March 31 2008-April 4 2008
Abstract :
Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segmentation and punctuation recovery for three different language pairs, namely for English-to-Spanish, Arabic-to-English and Chinese-to-English. We show that the proposed approach works equally well in these very different language pairs. Furthermore, we introduce two features computed from the translation beam-search lattice that indicate if phrasal and target language model context is jeopardized when segmenting at a given word boundary. These features enable us to introduce short intra-sentence segments without degrading translation performance.
Keywords :
language translation; natural languages; speech recognition; Arabic-to-English; Chinese-to-English; English-to-Spanish; punctuation recovery; sentence segmentation; speech recognition; spoken language translation; translation beam-search lattice; Automatic speech recognition; Context modeling; Data mining; Humans; Interactive systems; Laboratories; Lattices; Natural languages; System testing; Training data; Punctuation Recovery; Sentence Segmentation; Spoken Language Translation; Tight Coupling;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518807