مرکز منطقه ای اطلاع رساني علوم و فناوري - Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

DocumentCode :

2175467

Title :

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Author :

Latorre, Javier ; Gales, Mark J F ; Buchholz, Sabine ; Knill, Kate ; Tamurd, Masatsune ; Ohtani, Yamato ; Akamine, Masami

Author_Institution :

Cambridge Res. Lab., Toshiba Res. Eur. Ltd., Cambridge, UK

fYear :

2011

fDate :

22-27 May 2011

Firstpage :

4724

Lastpage :

4727

Abstract :

Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model.

Keywords :

hidden Markov models; speech processing; HMM-based TTS; continuous F0; source-excitation; source-excitation generation; speech signal processing; state-specific MSD-prior; voice classification; Equations; Generators; Hidden Markov models; Indexes; Mathematical model; Continuous F0; HMM-based synthesis; aperiodicity; multi-band mixed excitation; voiced/unvoiced decision;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Conference_Location :

Prague

ISSN :

1520-6149

Print_ISBN :

978-1-4577-0538-0

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2011.5947410

Filename :

5947410

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2175467