Title :
Transition-based speech synthesis using neural networks
Author :
Corrigan, G. ; Massey, N. ; Schnurr, O.
Author_Institution :
Motorola Labs., Schaumburg, IL, USA
Abstract :
Prior attempts to use neural networks to synthesize speech from a phonetic representation have used the neural network to generate a frame of input to a vocoder. As this requires the neural network to compute one output for each frame of speech from the vocoder, this can be computationally expensive. An alternative implementation is to model the speech as a series of gestures, and let the neural network generate parameters describing the transitions of the vocoder parameters during these gestures. Experiments have shown that acceptable speech quality is produced when each gesture is half of a phonetic segment and the transition model is a set of cubic polynomials describing the variation of each vocoder parameter during the gesture. This results in a significant reduction in computational cost
Keywords :
neural nets; polynomial approximation; speech synthesis; vocoders; computational cost reduction; cubic polynomials; neural networks; phonetic representation; phonetic segment; series of gestures; speech modeling; speech quality; transition model; transition-based speech synthesis; vocoder parameters; Computational efficiency; Computer networks; Equations; Mean square error methods; Network synthesis; Neural networks; Polynomials; Speech synthesis; Vocoders;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on
Conference_Location :
Istanbul
Print_ISBN :
0-7803-6293-4
DOI :
10.1109/ICASSP.2000.859117