مرکز منطقه ای اطلاع رساني علوم و فناوري - A hybrid model for text-to-speech synthesis

DocumentCode :

1420387

Title :

A hybrid model for text-to-speech synthesis

Author :

Violaro, Fábio ; Böeffard, Olivier

Author_Institution :

UNICAMP, Campinas, Brazil

Volume :

Issue :

fYear :

1998

fDate :

9/1/1998 12:00:00 AM

Firstpage :

426

Lastpage :

434

Abstract :

This paper describes a hybrid model developed for high-quality, concatenation-based, text-to-speech synthesis. The speech signal is submitted to a pitch-synchronous analysis and decomposed into a harmonic component, with a variable maximum frequency, plus a noise component. The harmonic component is modeled as a sum of sinusoids with frequencies that are multiples of the pitch. The noise component is modeled as a random excitation applied to an LPC filter. In unvoiced segments, the harmonic component is made equal to zero. In the presence of pitch modifications, a new set of harmonic parameters is evaluated by resampling the spectrum envelope at the new harmonic frequencies. For the synthesis of the harmonic component in the presence of duration and/or pitch modifications, a phase correction is introduced into the harmonic parameters. The sinusoidal model of synthesis is used for the harmonic component and the LPC model combined with an overlap and add procedure is used for the noise synthesis. This hybrid model enables independent and continuous control of duration and pitch of the synthesized speech. Comparative evaluation tests made in a text-to-speech environment have shown that the hybrid model assures a better performance than the time-domain pitch synchronous overlap-add (TD-PSOLA) model

Keywords :

filtering theory; harmonic analysis; linear predictive coding; noise; signal sampling; spectral analysis; speech coding; speech intelligibility; speech synthesis; LPC filter; concatenation-based text-to-speech synthesis; evaluation tests; harmonic component; harmonic frequencies; harmonic parameters; hybrid model; noise component; noise synthesis; phase correction; pitch duration; pitch modifications; pitch-synchronous analysis; random excitation; sinusoidal model; spectrum envelope resampling; speech quality; speech signal; time-domain pitch synchronous overlap-add; unvoiced segments; variable maximum frequency; Frequency; Harmonic analysis; Linear predictive coding; Power harmonic filters; Signal analysis; Signal synthesis; Speech analysis; Speech enhancement; Speech synthesis; Working environment noise;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/89.709668

Filename :

709668

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1420387