DocumentCode
900394
Title
Prosody modification using instants of significant excitation
Author
Rao, K. Sreenivasa ; Yegnanarayana, B.
Author_Institution
Dept. of Electron. & Commun. Eng., Indian Inst. of Technol. Guwahati, India
Volume
14
Issue
3
fYear
2006
fDate
5/1/2006 12:00:00 AM
Firstpage
972
Lastpage
980
Abstract
Prosody modification involves changing the pitch and duration of speech without affecting the message and naturalness. This paper proposes a method for prosody (pitch and duration) modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the linear prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch and duration is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms, and listening tests. The performance of the method is compared with linear prediction pitch synchronous overlap and add (LP-PSOLA) method, which is another method for prosody manipulation based on the modification of the LP residual. The original and the synthesized speech signals obtained by the proposed method and by the LP-PSOLA method are available for listening at http://speech.cs.iitm.ernet.in/Main/result/prosody.html.
Keywords
speech processing; speech synthesis; time-varying filters; glottal closure instants; linear prediction pitch synchronous overlap and add method; minimum phase signals; prosody modification; significant excitation instants; speech signals; synthesized speech; time-varying filter; vocal tract system; Bandwidth; Filters; Frequency estimation; Helium; Production systems; Shape; Signal synthesis; Spectrogram; Speech synthesis; Testing; Duration; LP residual; excitation source; instants of significant excitation (epochs); linear prediction pitch synchronous overlap and add (LP-PSOLA); pitch period; prosody modification;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TSA.2005.858051
Filename
1621209
Link To Document