DocumentCode
394299
Title
Inversion of F0 model for natural-sounding speech synthesis
Author
Rossi, Pierluigi Salvo ; Palmieri, Francesco ; Cutugno, Francesco
Author_Institution
Dipt. di Inf. e Sistemistica, Naples Univ., Italy
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
Natural-sounding speech synthesizers require information from a model quantitatively describing prosody. H. Fujisaki\´s model (see "Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing", The Production of Speech, Springer-Verlag, p.39-47, 1983) has shown considerable accuracy on many languages (Fujisaki et al., IEEE Int. Conf. on Acoustics, Speech and Sig. Processing, vol.2, p.211-14, 1993; Fujisaki and Ohno, S., Fourth Int. Conf. on Sig. Processing, vol.1, p.714-17,1998). We propose a method for the estimation of Fujisaki\´s model parameters, i.e., inversion methods, based on the relative extremes of the pitch contour and a gradient algorithm refinement procedure. Preliminary results show excellent performance of the proposed method in matching the pitch contours. Preliminary results of synthesis making use of the obtained features are very encouraging.
Keywords
feature extraction; gradient methods; natural languages; parameter estimation; speech synthesis; F0 model inversion; Italian continuous speech; fundamental frequency; gradient algorithm refinement procedure; inversion methods; model feature extraction; natural-sounding speech synthesis; parameter estimation; pitch contour; prosody; Feature extraction; Filtering; Filters; Fluctuations; Inverse problems; Mean square error methods; Solids; Speech synthesis; Testing; Timing;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198832
Filename
1198832
Link To Document