Accent Group modeling for improved prosody in statistical parameteric speech synthesis

Author

Krishna Anumanchipalli, Gopala ; Oliveira, Luis C. ; Black, Alan W.

Author_Institution

Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA

fYear

2013

Firstpage

6890

Lastpage

6894

Abstract

This paper presents an `Accent Group´ based intonation model for statistical parametric speech synthesis. We propose an approach to automatically model phonetic realizations of fundamental frequency(F0) contours as a sequence of intonational events anchored to a group of syllables (an Accent Group). We train an accent grouping model specific to that of the speaker, using a stochastic context free grammar and contextual decision trees on the syllables. This model is used to `parse´ an unseen text into its constituent accent groups over each of which appropriate intonation is predicted. The performance of the model is shown objectively and subjectively on a variety of prosodically diverse tasks- read speech, news broadcast and audio books.

Keywords

context-free grammars; decision trees; speech synthesis; statistical analysis; stochastic processes; accent group-based intonation model; audio books; automatic phonetic realization modeling; contextual decision trees; fundamental frequency contours; improved prosody; intonational event sequence; news broadcast; statistical parametric speech synthesis; stochastic context free grammar; syllables group; text parsing; Data models; Databases; Educational institutions; Hidden Markov models; Predictive models; Speech; Speech synthesis; Accent Group; Foot; Intonation Modeling; Phonology; Prosody; Statistical Parametric Speech Synthesis;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6638997

Filename

6638997