Title :
Minimum generation error training by using original spectrum as reference for log spectral distortion measure
Author :
Wu, Yi-Jian ; Tokuda, Keiichi
Author_Institution :
Nagoya Inst. of Technol., Nagoya
Abstract :
This paper improves a minimum generation error (MGE) based HMM training technique for HMM-based speech synthesis by directly using the original spectrum instead of line spectral pairs (LSPs) as reference spectrum for log spectral distortion (LSD) measure. Two types of original reference spectra for LSD calculation are investigated, including the spectrum extracted from speech waveform by STRAIGHT, and the short-time FFT spectrum calculated from speech waveforms. Since only the harmonics of the FFT spectrum are coincident with the underlying spectral envelope, the LSD between generated LSPs and original FFT spectrum is calculated by sampling at the harmonic frequencies, and a weighting function is designed to simulate the sampling strategy on LSPs. From the experimental results, the MGE-LSD training using the FFT spectrum as reference spectrum achieved the best performance.
Keywords :
distortion; fast Fourier transforms; hidden Markov models; spectral analysis; speech synthesis; HMM-based speech synthesis; log spectral distortion measure; minimum generation error training; original spectrum extraction; short-time FFT spectrum calculation; speech waveform; Distortion measurement; Euclidean distance; Frequency; Hidden Markov models; Sampling methods; Spectral analysis; Speech analysis; Speech processing; Speech synthesis; Training data; HMM; Speech synthesis; log spectral distortion; minimum generation error;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2009.4960508