DocumentCode
2705738
Title
A Statistical Approach for Modeling Prosody Features using POS Tags for Emotional Speech Synthesis
Author
Bulut, Murtaza ; Sungbok Lee ; Narayanan, Shrikanth
Author_Institution
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
Volume
4
fYear
2007
fDate
15-20 April 2007
Abstract
Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.
Keywords
Gaussian distribution; speech synthesis; statistical analysis; F0 differences; Gaussian distributions; Gaussian probability functions; POS tags; emotional speech processing; emotional speech synthesis; emotional utterances; median energy; neutral speech; normalized mean; pairwise comparisons; part-of-speech level; prosodic parameter differences; prosody feature; speech prominence; statistical approach; statistical models; Databases; Frequency; Gaussian distribution; Hidden Markov models; Probability; Speech analysis; Speech processing; Speech synthesis; Stress; Technical Activities Guide -TAG; POS; conversion; emotion; energy; prosody;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location
Honolulu, HI
ISSN
1520-6149
Print_ISBN
1-4244-0727-3
Type
conf
DOI
10.1109/ICASSP.2007.367300
Filename
4218331
Link To Document