• DocumentCode
    980740
  • Title

    An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS

  • Author

    Navas, Eva ; Hernáez, Inmaculada ; Luengo, Iker

  • Author_Institution
    Dept. of Electron. & Telecommun., Univ. of the Basque Country, Bilbao
  • Volume
    14
  • Issue
    4
  • fYear
    2006
  • fDate
    7/1/2006 12:00:00 AM
  • Firstpage
    1117
  • Lastpage
    1127
  • Abstract
    Building a text corpus suitable to be used in corpus-based speech synthesis is a time-consuming process that usually requires some human intervention to select the desired phonetic content and the necessary variety of prosodic contexts. If an emotional text-to-speech (TTS) system is desired, the complexity of the corpus generation process increases. This paper presents a study aiming to validate or reject the use of a semantically neutral text corpus for the recording of both neutral and emotional (acted) speech. The use of this kind of texts would eliminate the need to include semantically emotional texts into the corpus. The study has been performed for Basque language. It has been made by performing subjective and objective comparisons between the prosodic characteristics of recorded emotional speech using both semantically neutral and emotional texts. At the same time, the performed experiments allow for an evaluation of the capability of prosody to carry emotional information in Basque language. Prosody manipulation is the most common processing tool used in concatenative TTS. Experiments of automatic recognition of the emotions considered in this paper (the "Big Six emotions") show that prosody is an important emotional indicator, but cannot be the only manipulated parameter in an emotional TTS system-at least not for all the emotions. Resynthesis experiments transferring prosody from emotional to neutral speech have also been performed. They corroborate the results and support the use of a neutral-semantic-content text in databases for emotional speech synthesis
  • Keywords
    emotion recognition; linguistics; speech synthesis; Basque language; corpus generation process; corpus-based speech synthesis; emotional indicator; emotional speech recording; emotional speech synthesis; emotional text-to-speech system; emotional texts; neutral-semantic-content text; phonetic content; prosodic characteristics; prosody manipulation; semantically neutral text corpus; semantically neutral texts; text corpus building; Avatars; Buildings; Databases; Emotion recognition; Europe; Humans; Natural languages; Performance evaluation; Speech analysis; Speech synthesis; Evaluation of expressivity; prosody analysis; speech corpus design;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2006.876121
  • Filename
    1643641