DocumentCode :
1420387
Title :
A hybrid model for text-to-speech synthesis
Author :
Violaro, Fábio ; Böeffard, Olivier
Author_Institution :
UNICAMP, Campinas, Brazil
Volume :
6
Issue :
5
fYear :
1998
fDate :
9/1/1998 12:00:00 AM
Firstpage :
426
Lastpage :
434
Abstract :
This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies that are multiples of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures a better performance than the time-domain pitch synchronous overlap-add (TD-PSOLA) model
Keywords :
filtering theory; harmonic analysis; linear predictive coding; noise; signal sampling; spectral analysis; speech coding; speech intelligibility; speech synthesis; LPC filter; concatenation-based text-to-speech synthesis; evaluation tests; harmonic component; harmonic frequencies; harmonic parameters; hybrid model; noise component; noise synthesis; phase correction; pitch duration; pitch modifications; pitch-synchronous analysis; random excitation; sinusoidal model; spectrum envelope resampling; speech quality; speech signal; time-domain pitch synchronous overlap-add; unvoiced segments; variable maximum frequency; Frequency; Harmonic analysis; Linear predictive coding; Power harmonic filters; Signal analysis; Signal synthesis; Speech analysis; Speech enhancement; Speech synthesis; Working environment noise;
fLanguage :
English
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6676
Type :
jour
DOI :
10.1109/89.709668
Filename :
709668
Link To Document :
بازگشت