DocumentCode
43835
Title
Synthesis of Spontaneous Speech With Syllable Contraction Using State-Based Context-Dependent Voice Transformation
Author
Chung-Hsien Wu ; Yi-Chin Huang ; Chung-Han Lee ; Jun-Cheng Guo
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng-Kung Univ., Tainan, Taiwan
Volume
22
Issue
3
fYear
2014
fDate
Mar-14
Firstpage
585
Lastpage
595
Abstract
Pronunciation normally varies in spontaneous speech, and is an integral aspect of spontaneous expression. This study describes a voice transformation-based approach to generating spontaneous speech with syllable contractions for Hidden Markov Model (HMM)-based speech synthesis. A multi-dimensional linear regression model is adopted as the context-dependent, state-based transformation function to convert the feature sequence of read speech to that of spontaneous speech with syllable contraction. With insufficient number of training data, the obtained transformation functions are categorized using a decision tree based on linguistic and articulatory features for better and efficient selection of suitable transformation functions. Furthermore, to cope with the problem of small parallel corpus, cross-validation of trained transformation function is performed to ensure correct transformation functions are obtained and prevent over-fitting. Consequently, pronunciation variations of syllable contraction for the trained and the unseen syllable-contracted words are generated from the transformation function retrieved from the decision tree using linguistic and articulatory features. Objective and subjective tests were used to evaluate the performance of the proposed approach. Evaluation results demonstrate that the proposed transformation function substantially improves apparent spontaneity of the synthesized speech compared to the conventional methods.
Keywords
decision trees; feature extraction; hidden Markov models; regression analysis; speech synthesis; HMM; articulatory features; context-dependent voice transformation; cross-validation; decision tree; feature sequence; hidden Markov model; linear regression; linguistic features; pronunciation variations; small parallel corpus; speech synthesis; spontaneous expression; spontaneous speech; state-based transformation function; state-based voice transformation; syllable contraction; syllable-contracted words; Decision trees; Hidden Markov models; Pragmatics; Speech; Speech synthesis; Vectors; Pronunciation variation; speech synthesis; transformation function;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher
ieee
ISSN
2329-9290
Type
jour
DOI
10.1109/TASLP.2013.2297018
Filename
6698295
Link To Document