TD-PSOLA versus harmonic plus noise model in diphone based speech synthesis

Author

Syrdal, Ann ; Stylianou, Yannis ; Garrison, Laurie ; Conkie, Alistair ; Schroeter, Juergen

Author_Institution

Res. Labs., AT&T Labs., Florham Park, NJ, USA

Volume

1

fYear

1998

fDate

12-15 May 1998

Firstpage

273

Abstract

In an effort to select a speech representation for our next generation concatenative text-to-speech synthesizer, the use of two candidates is investigated; TD-PSOLA and the harmonic plus noise model, HNM. A formal listening test has been conducted and the two candidates have been rated regarding intelligibility, naturalness and pleasantness. Ability for database compression and computational load is also discussed. The results show that HNM consistently outperforms TD-PSOLA in all the above features except for computational load. HNM allows for high-quality speech synthesis without smoothing problems at the segmental boundaries and without buzziness or other oddities observed with TD-PSOLA

Keywords

acoustic noise; speech intelligibility; speech synthesis; HNM; TD-PSOLA; buzziness; computational load; database compression; diphone based speech synthesis; formal listening test; harmonic plus noise model; high-quality speech synthesis; intelligibility; naturalness; next generation concatenative text-to-speech synthesizer; pleasantness; segmental boundaries; speech representation; Acoustic noise; Linear predictive coding; Man machine systems; Smoothing methods; Spatial databases; Speech analysis; Speech enhancement; Speech synthesis; Synthesizers; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on

Conference_Location

Seattle, WA

ISSN

1520-6149

Print_ISBN

0-7803-4428-6

Type

conf

DOI

10.1109/ICASSP.1998.674420

Filename

674420