A superpositional model applied to F0 parameterization using DCT for text-to-speech synthesis

Author

Stan, Adriana ; Giurgiu, Mircea

Author_Institution

Commun. Dept., Tech. Univ. of Cluj-Napoca, Cluj-Napoca, Romania

fYear

2011

fDate

18-21 May 2011

Firstpage

1

Lastpage

6

Abstract

This paper addresses the idea of the superpositional model based on the DCT (Discrete Cosine Transform) parameterization of the F0 contours. We examine the capacity of the DCT coefficients to estimate the fast variations in the F0 contour at syllable level and also the overall trend of the phrase level. The method determines the coefficients at syllable level, based on the subtraction of the estimated phrase level contour from the original one; thus considering that the syllable has an additive prosodic effect over the phrase level. We also compare the use of 3 different decision and regression tree algorithms for DCT coefficients clustering and prediction. Additional features are selected based on a greedy stepwise without backtracking feature selection method. The results support the proposed method through low average square errors and little or no perceivable errors in the synthesized speech.

Keywords

decision trees; discrete cosine transforms; regression analysis; speech synthesis; DCT; decision tree; discrete cosine transform; regression tree algorithm; superpositional model; text to speech synthesis; Decision trees; Discrete cosine transforms; Feature extraction; Prediction algorithms; Speech; Stress; Training; DCT; F0 modelling; pitch; prosody;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech Technology and Human-Computer Dialogue (SpeD), 2011 6th Conference on

Conference_Location

Brasov

Print_ISBN

978-1-4577-0440-6

Type

conf

DOI

10.1109/SPED.2011.5940734

Filename

5940734