مرکز منطقه ای اطلاع رساني علوم و فناوري - Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis

DocumentCode :

2016523

Title :

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis

Author :

Ling, Zhen-Hua ; Wang, Zhi-Guo ; Dai, Li-Rong

Author_Institution :

iFLYTEK Speech Lab., Univ. of Sci. & Technol. of China, Hefei, China

fYear :

2010

fDate :

Nov. 29 2010-Dec. 3 2010

Firstpage :

144

Lastpage :

147

Abstract :

In current hidden Markov model(HMM) based unit selection speech synthesis method, the optimal phone-sized candidate units are selected following the maximum likelihood(ML) criterion of the HMMs trained for various acoustic features. This paper introduces the statistical models for syllable-level F0 features into this method. Different from the frame-level F0 parameters used in the current framework, the pitch contour of the vowel in each syllable and its combination for adjacent syllables are extracted to represent the suprasegmental property of F0 features. A context-dependent statistical model is trained using these syllable-level F0 features and the likelihood function of this model is integrated into the unit selection criterion to evaluate the suprasegmental prosody of a given unit sequence. The conventional dynamic programming search algorithm for the phone-sized unit selection is modified to take into account the dependency between the candidate units for the vowels of adjacent syllables which is caused by the syllable-level F0 modeling. Our experiment results prove that this method can improve the naturalness of synthesized speech significantly.

Keywords :

computational linguistics; dynamic programming; hidden Markov models; maximum likelihood estimation; search problems; speech synthesis; statistical analysis; acoustic feature; context dependent statistical model; dynamic programming; hidden Markov model; likelihood function; maximum likelihood criterion; pitch contour; search algorithm; speech synthesis method; suprasegmental prosody; syllable level F0 features; unit selection criterion; vowel; Acoustics; Context modeling; Feature extraction; Hidden Markov models; Speech; Speech synthesis; Training; F0 model; Speech synthesis; hidden Markov model; unit selection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on

Conference_Location :

Tainan

Print_ISBN :

978-1-4244-6244-5

Type :

conf

DOI :

10.1109/ISCSLP.2010.5684833

Filename :

5684833

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2016523