Applying the harmonic plus noise model in concatenative speech synthesis

Author

Stylianou, Yannis

Author_Institution

Shannon Labs., AT&T Labs.-Res., Florham Park, NJ, USA

Volume

9

Issue

1

fYear

2001

fDate

1/1/2001 12:00:00 AM

Firstpage

21

Lastpage

29

Abstract

This paper describes the application of the harmonic plus noise model (HNM) for concatenative text-to-speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of a speech signal into these two components allows for more natural-sounding modifications of the signal (e.g., by using different and better adapted schemes to modify each component). The parametric representation of speech using HNM provides a straightforward way of smoothing discontinuities of acoustic units around concatenation points. Formal listening tests have shown that HNM provides high-quality speech synthesis while outperforming other models for synthesis (e.g., TD-PSOLA) in intelligibility, naturalness, and pleasantness

Keywords

acoustic signal processing; harmonics; noise; signal representation; smoothing methods; speech intelligibility; speech synthesis; acoustic units; adapted schemes; concatenative text-to-speech synthesis; discontinuities smoothing; formal listening tests; harmonic plus noise model; high-quality speech synthesis; modulated noise component; natural-sounding signal modifications; parametric speech representation; speech intelligibility; speech naturalness; speech pleasantness; speech signal decomposition; speech signals representation; time-varying harmonic component; Acoustic noise; Context modeling; Degradation; Filters; Linear predictive coding; Phase estimation; Signal synthesis; Speech processing; Speech synthesis; Transaction databases;

fLanguage

English

Journal_Title

Speech and Audio Processing, IEEE Transactions on

Publisher

ieee

ISSN

1063-6676

Type

jour

DOI

10.1109/89.890068

Filename

890068