Title :
Gaussian process dynamical models for nonparametric speech representation and synthesis
Author :
Henter, Gustav Eje ; Frean, Marcus R. ; Kleijn, W. Bastiaan
Author_Institution :
Sch. of Electr. Eng., KTH-R. Inst. of Technol., Stockholm, Sweden
Abstract :
We propose Gaussian process dynamical models (GPDMs) as a new, nonparametric paradigm in acoustic models of speech. These use multidimensional, continuous state-spaces to overcome familiar issues with discrete-state, HMM-based speech models. The added dimensions allow the state to represent and describe more than just temporal structure as systematic differences in mean, rather than as mere correlations in a residual (which dynamic features or AR-HMMs do). Being based on Gaussian processes, the models avoid restrictive parametric or linearity assumptions on signal structure. We outline GPDM theory, and describe model setup and initialization schemes relevant to speech applications. Experiments demonstrate subjectively better quality of synthesized speech than from comparable HMMs. In addition, there is evidence for unsupervised discovery of salient speech structure.
Keywords :
Gaussian processes; hidden Markov models; speech synthesis; AR-HMM; Gaussian process dynamical models; HMM-based speech models; continuous state-spaces; hidden Markov models; nonparametric speech representation; nonparametric speech synthesis; outline GPDM theory; signal structure; speech acoustic models; Acoustics; Computational modeling; Gaussian processes; Hidden Markov models; Noise; Speech; acoustic models; nonparametric speech synthesis; sampling; stochastic models;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288919