• DocumentCode
    3570548
  • Title

    Collection and analysis of a Japanese-English emphasized speech corpora

  • Author

    Do Quoc Truong ; Neubig, Graham ; Sakti, Sakriani ; Toda, Tomoki ; Nakamura, Satoshi

  • Author_Institution
    Nara Inst. of Sci. & Technol., Ikoma, Japan
  • fYear
    2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Speech-to-speech (S2S) translation [10] is gradually starting to break down the language barrier, bringing opportunities for people to understand each other using different languages. However, one of the limitations of current S2S systems that they usually do not translate the paralinguistic information included in the input speech. Among the various types of paralinguistic information, we focus on emphasis, a type of information that is used to convey the focus of the sentence, emotion of the speaker, or other high level information useful for communication. This paper describes the collection of an Japanese-English emphasized speech corpora that can be used in the study of how emphasis is expressed across languages. We constructed 2 corpora, one containing digit strings and one with a utterances from a conversational setting. The speakers who can speak both Japanese and English were selected for the recording. 500 parallel digit strings for the digit corpus and 2030 parallel sentences for the conversation corpus were collected. The corpora may be used to analyze emphasis of one language or between languages, or develop emphasized speech translation systems.
  • Keywords
    language translation; natural language processing; speech processing; string matching; Japanese-English emphasized speech corpora; S2S translation; conversation corpus; digit corpus; language barrier; paralinguistic information; parallel digit string; parallel sentence; speaker emotion; speech translation system; speech-to-speech translation; Hidden Markov models; Materials; Prototypes; Speech; Speech synthesis; Stress; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014 17th Oriental Chapter of the International Committee for the
  • Type

    conf

  • DOI
    10.1109/ICSDA.2014.7051424
  • Filename
    7051424