DocumentCode :
1161199
Title :
Comparative study on corpora for speech translation
Author :
Kikui, Genichiro ; Yamamoto, Seiichi ; Takezawa, Toshiyuki ; Sumita, Eiichiro
Author_Institution :
ATR Spoken Language Commun. Res. Labs., Kyoto
Volume :
14
Issue :
5
fYear :
2006
Firstpage :
1674
Lastpage :
1682
Abstract :
This paper investigates issues in preparing corpora for developing speech-to-speech translation (S2ST). It is impractical to create a broad-coverage parallel corpus only from dialog speech. An alternative approach is to have bilingual experts write conversational-style texts in the target domain, with translations. There is, however, a risk of losing fidelity to the actual utterances. This paper focuses on balancing a tradeoff between these two kinds of corpora through the analysis of two newly developed corpora in the travel domain: a bilingual parallel corpus with 420 K utterances and a collection of in-domain dialogs using actual S2ST systems. We found that the first corpus is effective for covering utterances in the second corpus if complimented with a small number of utterances taken from monolingual dialogs. We also found that characteristics of in-domain utterances become closer to those of the first corpus when more restrictive conditions and instructions to speakers are given. These results suggest the possibility of a bootstrap-style of development of corpora and S2ST systems, where an initial S2ST system is developed with parallel texts, and is then gradually improved with in-domain utterances collected by the system as restrictions are relaxed
Keywords :
language translation; speech processing; bilingual experts; bilingual parallel corpus; conversational-style texts; corpora; dialog speech; in-domain utterances; speech-to-speech translation; travel domain; Communications technology; Global communication; Humans; Laboratories; Natural languages; Parameter estimation; Speech recognition; Speech synthesis; State estimation; Strontium; Corpus; machine translation; speech translation; spoken dialog;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2006.878262
Filename :
1677987
Link To Document :
بازگشت