DocumentCode
25202
Title
Statistical Parametric Speech Synthesis Based on Gaussian Process Regression
Author
Koriyama, Tomoki ; Nose, Takashi ; Kobayashi, Takehiko
Author_Institution
Dept. of Inf. Process., Tokyo Inst. of Technol., Yokohama, Japan
Volume
8
Issue
2
fYear
2014
fDate
Apr-14
Firstpage
173
Lastpage
183
Abstract
This paper proposes a statistical parametric speech synthesis technique based on Gaussian process regression (GPR). The GPR model is designed for directly predicting frame-level acoustic features from corresponding information on frame context that is obtained from linguistic information. The frame context includes the relative position of the current frame within the phone and articulatory information and is used as the explanatory variable in GPR. Here, we introduce cluster-based sparse Gaussian processes (GPs), i.e., local GPs and partially independent conditional (PIC) approximation, to reduce the computational cost. The experimental results for both isolated phone synthesis and full-sentence continuous speech synthesis revealed that the proposed GPR-based technique without dynamic features slightly outperformed the conventional hidden Markov model (HMM)-based speech synthesis using minimum generation error training with dynamic features.
Keywords
Gaussian processes; regression analysis; speech synthesis; GPR model; Gaussian process regression; HMM; PIC approximation; articulatory information; cluster-based sparse Gaussian processes; dynamic features; frame context; frame-level acoustic features; full-sentence continuous speech synthesis; hidden Markov model; isolated phone synthesis; minimum generation error training; partially independent conditional approximation; phone information; relative position; statistical parametric speech synthesis technique; Context; Covariance matrices; Hidden Markov models; Kernel; Speech synthesis; Training; Training data; Gaussian process regression; nonparametric Bayesian model; partially independent conditional (PIC) approximation; sparse Gaussian processes; statistical speech synthesis;
fLanguage
English
Journal_Title
Selected Topics in Signal Processing, IEEE Journal of
Publisher
ieee
ISSN
1932-4553
Type
jour
DOI
10.1109/JSTSP.2013.2283461
Filename
6609068
Link To Document