• DocumentCode
    3246310
  • Title

    Approach toward speech-to-speech translation system by using a collection of sentences and utterances

  • Author

    Sumita, Eiichiro ; Nakaiwa, Hiromi ; Kikui, Genichiro ; Yamamoto, Seiichi

  • Author_Institution
    ATR Spoken Language Translation Res. Labs., Kyoto, Japan
  • fYear
    2003
  • fDate
    30 Nov.-3 Dec. 2003
  • Firstpage
    652
  • Lastpage
    657
  • Abstract
    Corpus-based technology is very promising for speech-to-speech translation. However, the problem is that it is prohibitively expensive to build the vital resource, a large-scale corpus of bilingual dialogues covering many domains. We propose to substitute a combination of two different types of bilingual corpora: (1) a large-scale collection of basic sentences that covers many domains; and (2) a small-scale collection of spoken dialogues that reflects the characteristics of the spoken utterances for the large-scale corpus of dialogues. With these two corpora, we have been building a translation module for a speech-to-speech translation system. By using the basic sentence corpus, we have achieved high-quality translations with several machine-learning approaches. Based on an analysis of the spoken dialogue corpus, we found that splitting utterances into parts and concatenating the translated parts is an effective way to translate the longer utterances that are inherent in a spoken dialogue.
  • Keywords
    language translation; learning (artificial intelligence); speech recognition; speech synthesis; bilingual dialogue corpus; corpus-based technology; machine learning methods; sentence collection method; speech-to-speech translation system; spoken dialogue utterance splitting; utterance collection method; Cities and towns; Humans; Laboratories; Large-scale systems; Machine learning; Natural languages; Oral communication; Speech; System testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
  • Print_ISBN
    0-7803-7980-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2003.1318517
  • Filename
    1318517