Title :
Contextual Additive Structure for HMM-Based Speech Synthesis
Author :
Takaki, Shinji ; Nankaku, Yoshihiko ; Tokuda, Keiichi
Author_Institution :
Dept. of Comput. Sci. & Enineering, Nagoya Inst. of Technol., Nagoya, Japan
Abstract :
This paper proposes a spectral modeling technique based on an additive structure of context dependencies for HMM-based speech synthesis. Contextual additive structure models can represent complicated dependencies between acoustic features and context labels using multiple decision trees. However, the computational complexity of the context clustering is too high for the full context labels of speech synthesis. To overcome this problem, this paper proposes two approaches; covariance parameter tying and a likelihood calculation algorithm using the matrix inversion lemma. Additive structure models can be applied to HMM-based speech synthesis using these techniques and speech quality would significantly be improved. Experimental results show that the proposed method outperforms the conventional one in subjective listening tests.
Keywords :
covariance matrices; hidden Markov models; matrix inversion; pattern clustering; speech synthesis; HMM-based speech synthesis; acoustic features; computational complexity; context clustering; context dependencies; context labels; contextual additive structure models; covariance parameter tying; likelihood calculation algorithm; matrix inversion lemma; multiple decision trees; spectral modeling technique; speech quality; subjective listening tests; Acoustics; Additives; Computational modeling; Context; Context modeling; Decision trees; Hidden Markov models; HMM-based speech synthesis; additive structure; context clustering; decision trees; distribution convolution; spectral modeling;
Journal_Title :
Selected Topics in Signal Processing, IEEE Journal of
DOI :
10.1109/JSTSP.2014.2305919