Title :
A new method for FO tracking errors fix and generation in HMM-based Mandarin speech synthesis using generation process model
Author :
Wang, Miaomiao ; Wen, Miaomiao ; Hirose, Keikichi ; Minematsu, Nobuaki
Author_Institution :
Grad. Sch. of Eng., Univ. of Tokyo, Tokyo, Japan
Abstract :
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (VU) decisions are the two key factors in voice quality problems. Also these errors will enlarge the RMSE of phoneme duration. In HMM-based TTS durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper, an F0 generation process model is used to re-estimate F0 values in the regions of pitch tracking errors, as well as in unvoiced regions. A prior knowledge of VU is imposed in each Mandarin phoneme and they are used for VU decision. Also we design two sets of syntax features to improve Mandarin phone and pause duration prediction respectively.
Keywords :
hidden Markov models; mean square error methods; speech synthesis; statistical distributions; Fο generation process model; F0 tracking errors fix; HMM; Mandarin phoneme; RMSE; TTS; generation process model; probability distributions; speech synthesis; text-to-speech system; Feature extraction; Hidden Markov models; Noise measurement; Robustness; Speech; Speech synthesis; Training; F0 generation; HMM-based speech synthesis; Mandarin speech synthesis; VU error fix; generation process model;
Conference_Titel :
Signal Processing (ICSP), 2010 IEEE 10th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5897-4
DOI :
10.1109/ICOSP.2010.5656850