Transition-based speech synthesis using neural networks

Author

Corrigan, G. ; Massey, N. ; Schnurr, O.

Author_Institution

Motorola Labs., Schaumburg, IL, USA

Volume

2

fYear

2000

fDate

2000

Abstract

Prior attempts to use neural networks to synthesize speech from a phonetic representation have used the neural network to generate a frame of input to a vocoder. As this requires the neural network to compute one output for each frame of speech from the vocoder, this can be computationally expensive. An alternative implementation is to model the speech as a series of gestures, and let the neural network generate parameters describing the transitions of the vocoder parameters during these gestures. Experiments have shown that acceptable speech quality is produced when each gesture is half of a phonetic segment and the transition model is a set of cubic polynomials describing the variation of each vocoder parameter during the gesture. This results in a significant reduction in computational cost

Keywords

neural nets; polynomial approximation; speech synthesis; vocoders; computational cost reduction; cubic polynomials; neural networks; phonetic representation; phonetic segment; series of gestures; speech modeling; speech quality; transition model; transition-based speech synthesis; vocoder parameters; Computational efficiency; Computer networks; Equations; Mean square error methods; Network synthesis; Neural networks; Polynomials; Speech synthesis; Vocoders;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on

Conference_Location

Istanbul

ISSN

1520-6149

Print_ISBN

0-7803-6293-4

Type

conf

DOI

10.1109/ICASSP.2000.859117

Filename

859117