DocumentCode :
3427555
Title :
Sentence segmentation and punctuation recovery for spoken language translation
Author :
Paulik, Matthias ; Rao, Sharath ; Lane, Ian ; Vogel, Stephan ; Schultz, Tanja
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
5105
Lastpage :
5108
Abstract :
Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segmentation and punctuation recovery for three different language pairs, namely for English-to-Spanish, Arabic-to-English and Chinese-to-English. We show that the proposed approach works equally well in these very different language pairs. Furthermore, we introduce two features computed from the translation beam-search lattice that indicate if phrasal and target language model context is jeopardized when segmenting at a given word boundary. These features enable us to introduce short intra-sentence segments without degrading translation performance.
Keywords :
language translation; natural languages; speech recognition; Arabic-to-English; Chinese-to-English; English-to-Spanish; punctuation recovery; sentence segmentation; speech recognition; spoken language translation; translation beam-search lattice; Automatic speech recognition; Context modeling; Data mining; Humans; Interactive systems; Laboratories; Lattices; Natural languages; System testing; Training data; Punctuation Recovery; Sentence Segmentation; Spoken Language Translation; Tight Coupling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
ISSN :
1520-6149
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2008.4518807
Filename :
4518807
Link To Document :
بازگشت