Title :
Expressive synthesis: how crucial is voice quality?
Author :
Gobl, Christer ; Bennett, Eva ; Chasaide, Ailbhe Ni
Author_Institution :
Centre for Language & Commun. Studies, Trinity Coll., Dublin, Ireland
Abstract :
This paper compares the emotive colouring that can be achieved in synthesis by f0 manipulations alone (f0 only) as compared to manipulations of f0 with voice quality (VQ+f0), and asks how crucial large f0 excursions are in signalling strong emotions. Are they overwhelmingly important, with voice quality contributing mainly to finer distinctions for milder affects? Or are voice quality and large f0 differences required for the strong emotions? The ´VQ+f0´ stimuli, of an utterance synthesised using the LF voice source in KLSYN88 with breathy, whispery, lax-creaky, modal, tense and harsh voice qualities (Gobl et al. (2002)), were further manipulated to replicate the f0 differences described in Mozziconacci et al. (1995) for 6 emotions, matched to the most appropriate voice quality. The ´f0 only´ stimuli used the same set of f0 contours, but retained source settings for modal voice. 10 listeners rated the affective colouring of the stimuli on a seven-point scale, in terms of pairs of opposite attributes. For both strong and milder affects the ´VQ+f0´ stimuli achieved much higher ratings than the ´f0 only´ stimuli, which were relatively ineffective. Implications for the synthesis of expressive speech are discussed.
Keywords :
speech processing; speech synthesis; LF voice source; emotive colouring; expressive synthesis; voice quality; Educational institutions; Mood; Signal synthesis; Speech synthesis; Testing;
Conference_Titel :
Speech Synthesis, 2002. Proceedings of 2002 IEEE Workshop on
Print_ISBN :
0-7803-7395-2
DOI :
10.1109/WSS.2002.1224380