Inversion of F₀ model for natural-sounding speech synthesis

Author

Rossi, Pierluigi Salvo ; Palmieri, Francesco ; Cutugno, Francesco

Author_Institution

Dipt. di Inf. e Sistemistica, Naples Univ., Italy

Volume

1

fYear

2003

fDate

6-10 April 2003

Abstract

Natural-sounding speech synthesizers require information from a model quantitatively describing prosody. H. Fujisaki\´s model (see "Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing", The Production of Speech, Springer-Verlag, p.39-47, 1983) has shown considerable accuracy on many languages (Fujisaki et al., IEEE Int. Conf. on Acoustics, Speech and Sig. Processing, vol.2, p.211-14, 1993; Fujisaki and Ohno, S., Fourth Int. Conf. on Sig. Processing, vol.1, p.714-17,1998). We propose a method for the estimation of Fujisaki\´s model parameters, i.e., inversion methods, based on the relative extremes of the pitch contour and a gradient algorithm refinement procedure. Preliminary results show excellent performance of the proposed method in matching the pitch contours. Preliminary results of synthesis making use of the obtained features are very encouraging.

Keywords

feature extraction; gradient methods; natural languages; parameter estimation; speech synthesis; F₀ model inversion; Italian continuous speech; fundamental frequency; gradient algorithm refinement procedure; inversion methods; model feature extraction; natural-sounding speech synthesis; parameter estimation; pitch contour; prosody; Feature extraction; Filtering; Filters; Fluctuations; Inverse problems; Mean square error methods; Solids; Speech synthesis; Testing; Timing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-7663-3

Type

conf

DOI

10.1109/ICASSP.2003.1198832

Filename

1198832

Inversion of F0 model for natural-sounding speech synthesis

Rossi, Pierluigi Salvo ; Palmieri, Francesco ; Cutugno, Francesco

conf

Inversion of F₀ model for natural-sounding speech synthesis