Title :
A Statistical Approach for Modeling Prosody Features using POS Tags for Emotional Speech Synthesis
Author :
Bulut, Murtaza ; Sungbok Lee ; Narayanan, Shrikanth
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Abstract :
Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.
Keywords :
Gaussian distribution; speech synthesis; statistical analysis; F0 differences; Gaussian distributions; Gaussian probability functions; POS tags; emotional speech processing; emotional speech synthesis; emotional utterances; median energy; neutral speech; normalized mean; pairwise comparisons; part-of-speech level; prosodic parameter differences; prosody feature; speech prominence; statistical approach; statistical models; Databases; Frequency; Gaussian distribution; Hidden Markov models; Probability; Speech analysis; Speech processing; Speech synthesis; Stress; Technical Activities Guide -TAG; POS; conversion; emotion; energy; prosody;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
DOI :
10.1109/ICASSP.2007.367300