Title :
Statistical parametric speech synthesis using deep neural networks
Author :
Heiga Ze ; Senior, Alan ; Schuster, Martin
Author_Institution :
Google, Mountain View, CA, USA
Abstract :
Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.
Keywords :
hidden Markov models; neural nets; speech synthesis; HMM; acoustic realizations; decision tree clustered context dependent hidden Markov models; decision trees; deep neural networks; probability densities; speech parameters; speech waveform; statistical parametric speech synthesis; Context; Decision trees; Hidden Markov models; Neural networks; Speech; Speech synthesis; Training data; Deep neural network; Hidden Markov model; Statistical parametric speech synthesis;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639215