Control of spectral dynamics in concatenative speech synthesis

Author

Wouters, Johan ; Macon, Michael W.

Author_Institution

Center for Spoken Language Understanding, Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR, USA

Volume

9

Issue

1

fYear

2001

fDate

1/1/2001 12:00:00 AM

Firstpage

30

Lastpage

38

Abstract

Current speech synthesis methods based on the concatenation of waveform units can produce highly intelligible speech capturing the identity of a particular speaker. However, the quality of concatenated speech often suffers from discontinuities between the acoustic units, due to contextual differences and variations in speaking style across the database. In this paper, we present methods to spectrally modify speech units in a concatenative synthesizer to correspond more closely to the acoustic transitions observed in natural speech. First, a technique called “unit fusion” is proposed to reduce spectral mismatch between units. In addition to concatenation units, a second, independent tier of units is selected that defines the desired spectral dynamics at concatenation points. Both unit tiers are “fused” to obtain natural transitions throughout the synthesized utterance. The unit fusion method is further extended to control the perceived degree of articulation of concatenated units. A signal processing technique based on sinusoidal modeling is also presented that enables high-quality resynthesis of units with a modified spectral shape

Keywords

acoustic signal processing; spectral analysis; speech intelligibility; speech synthesis; acoustic transitions; acoustic units discontinuities; articulation control; concatenated speech quality; concatenation units; concatenative speech synthesis; concatenative synthesizer; database; high intelligible speech; high-quality resynthesis; modified spectral shape; natural speech; signal processing; sinusoidal modeling; speaking style variations; spectral dynamics control; spectral mismatch reduction; unit fusion; unit fusion method; waveform units; Concatenated codes; Databases; Loudspeakers; Natural languages; Runtime; Signal processing; Signal synthesis; Spectral shape; Speech synthesis; Synthesizers;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.890069

Filename

890069