DocumentCode
1416859
Title
Control of spectral dynamics in concatenative speech synthesis
Author
Wouters, Johan ; Macon, Michael W.
Author_Institution
Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR, USA
Volume
9
Issue
1
fYear
2001
fDate
1/1/2001 12:00:00 AM
Firstpage
30
Lastpage
38
Abstract
Current speech synthesis methods based on the concatenation of waveform units can produce highly intelligible speech capturing the identity of a particular speaker. However, the quality of concatenated speech often suffers from discontinuities between the acoustic units, due to contextual differences and variations in speaking style across the database. In this paper, we present methods to spectrally modify speech units in a concatenative synthesizer to correspond more closely to the acoustic transitions observed in natural speech. First, a technique called “unit fusion” is proposed to reduce spectral mismatch between units. In addition to concatenation units, a second, independent tier of units is selected that defines the desired spectral dynamics at concatenation points. Both unit tiers are “fused” to obtain natural transitions throughout the synthesized utterance. The unit fusion method is further extended to control the perceived degree of articulation of concatenated units. A signal processing technique based on sinusoidal modeling is also presented that enables high-quality resynthesis of units with a modified spectral shape
Keywords
acoustic signal processing; spectral analysis; speech intelligibility; speech synthesis; acoustic transitions; acoustic units discontinuities; articulation control; concatenated speech quality; concatenation units; concatenative speech synthesis; concatenative synthesizer; database; high intelligible speech; high-quality resynthesis; modified spectral shape; natural speech; signal processing; sinusoidal modeling; speaking style variations; spectral dynamics control; spectral mismatch reduction; unit fusion; unit fusion method; waveform units; Concatenated codes; Databases; Loudspeakers; Natural languages; Runtime; Signal processing; Signal synthesis; Spectral shape; Speech synthesis; Synthesizers;
fLanguage
English
Journal_Title
Speech and Audio Processing, IEEE Transactions on
Publisher
ieee
ISSN
1063-6676
Type
jour
DOI
10.1109/89.890069
Filename
890069
Link To Document