DocumentCode
1100019
Title
Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis
Author
Ling, Zhen-Hua ; Richmond, Korin ; Yamagishi, Junichi ; Wang, Ren-Hua
Author_Institution
iFlytek Speech Lab., Univ. of Sci. & Technol. of China, Hefei
Volume
17
Issue
6
fYear
2009
Firstpage
1171
Lastpage
1185
Abstract
This paper presents an investigation into ways of integrating articulatory features into hidden Markov model (HMM)-based parametric speech synthesis. In broad terms, this may be achieved by estimating the joint distribution of acoustic and articulatory features during training. This may in turn be used in conjunction with a maximum-likelihood criterion to produce acoustic synthesis parameters for generating speech. Within this broad approach, we explore several variations that are possible in the construction of an HMM-based synthesis system which allow articulatory features to influence acoustic modeling: model clustering, state synchrony and cross-stream feature dependency. Performance is evaluated using the RMS error of generated acoustic parameters as well as formal listening tests. Our results show that the accuracy of acoustic parameter prediction and the naturalness of synthesized speech can be improved when shared clustering and asynchronous-state model structures are adopted for combined acoustic and articulatory features. Most significantly, however, our experiments demonstrate that modeling the dependency between these two feature streams can make speech synthesis systems more flexible. The characteristics of synthetic speech can be easily controlled by modifying generated articulatory features as part of the process of producing acoustic synthesis parameters.
Keywords
acoustic signal processing; feature extraction; hidden Markov models; maximum likelihood estimation; speech synthesis; HMM-based parametric speech synthesis; acoustic parameter prediction; acoustic synthesis parameter; articulatory feature; asynchronous-state model structure; cross-stream feature dependency; hidden Markov model; maximum-likelihood criterion; shared clustering system; Acoustic testing; Automatic speech recognition; Character generation; Control system synthesis; Hidden Markov models; Humans; Magnetic resonance imaging; Maximum likelihood estimation; Speech processing; Speech synthesis; Articulatory features; hidden Markov model (HMM); speech production; speech synthesis;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2009.2014796
Filename
5109768
Link To Document